# Saving Time with Views, Functions, & Triggers

One advantage of using a programming language is that we can automate repetitive, boring tasks. We can take these queries or steps we might do over & over & turn them into reusable database objects that you code once & call later to let the database do the work. This is called the DRY principle: Don't Repeat Yourself.

---

# Using Views to Simplify Queries

A *view* is essentially a stored query with a name that you can work with as if it were a table. For example, a view might store a query that calculates total population by state. As with a table, you can query that view, join the view to tables (or to other views), & use the view to update or insert data into a table it's based on, albeit with some caveats. The stored query in a view can be simple, referencing just one table, or complex, with multiple table joins.

Views are especially useful in the following scenarios:

* **Avoiding duplicate effort:** They let you write a complex query once & access the results when needed.
* **Reducing clutter:** They can trim the amount of information you need to wade through by showing only columns relevant to your needs.
* **Providing security:** Views can limit access to only certain columns in a table.

We'll look at two kinds of views. The first -- a standard view -- contains PostgreSQL syntax that's largely in line with the ANSI SQL standard for views. Every time you access a standard view, the stored query runs & generates a temporary set of results. The second is a *materialised view*, which is specific to PostgreSQL, Oracle, & a limited number of other database systems. When you create a materialised view, the data returned by its query is stored permanently in the database like a table; you can refresh the view to update the stored data if needed.

## Creating & Querying Views

We'll return to the census estimates table `us_counties_pop_est_2019`. The query below creates a standard view that returns just the population of Nevada counties. The original table has sixteen columns; the view will return just four of them. This will be  useful for making a subset of Nevada census data quickly accessible when we're referring to it often or using the data in an application.

```
CREATE OR REPLACE VIEW nevada_counties_pop_2019
AS (
    SELECT county_name,
           state_fips,
           county_fips,
           pop_est_2019
    FROM us_counties_pop_est-2019
    WHERE state_name = 'Nevada'
);
```

We define the view using the keywords `CREATE OR REPLACE VIEW` followed by the view's name, `nevada_counties_pop_2019`, & then `AS`. Next, we use a standard SQL `SELECT` to fetch the 2019 population estimate (the `pop_est_2019` column) for each Nevada county from the `us_counties_pop_est_2019` table.

Notice the `OR REPLACE` keywords after `CREATE`. These are optional & tell the database that if a view with this name already exists, then replace it with the new definition. Its helpful to include these keywords if you're iterating on creating a view & want to refine the query. There is one caveat: if you're replacing an existing view, the new query must generate the same column names with the same data types & in the same order as the one it's replacing. You can add columns, but they must be placed at the end of the column list. If you try to do otherwise, the database will respond with an error message.

Run the query using pgAdmin. The database should respond with the message `CREATE VIEW`. To find the new view, in pgAdmin's object browser, right-click the `analysis` database & click **Refresh**. Choose **Schemas -> public -> Views** to see all views. When you right-click your new view & click **Properties**, you should see a more verbose version of the query (with the table name prepended to each column name) on the code tab in the dialogue that opens. That's a handy way to inspect view you might find in a database.

<img src = "Inspecting Views.png" width = "600" style = "margin:auto"/>

This type of view -- one that isn't materialised -- holds no data at this point; instead, the stored `SELECT` query it contains will run when you access the view from another query. For example, the query below returns all columns in the view. As with a typical `SELECT` query, we can use `ORDER BY` to sort results, this time using the county's Federal Information Processing Standards (FIPS) code -- the standard designator the US Census Bureau & other federal agencies use to specify each county & state. We also add a `LIMIT` clause to display just five rows.

```
SELECT *
FROM nevada_counties_pop_2019
ORDER BY county_fips
LIMIT 5;
```

<img src = "Querying the nevada_counties_pop_2019 View.png" width = "600" style = "margin:auto"/>

This simple example isn't useful unless quickly listing Nevada county population is a task you'll perform frequently. So, let's imagine a question data-minded analysts in a political research organisation might ask often: what was the percent change in population for each county in Nevada (or any other state) from 2010 to 2019?

We'll also create a view that stores the query, as shown below:

```
CREATE OR REPLACE VIEW county_pop_change_2019_2010
AS (
    SELECT c2019.county_name,
           c2019.state_name,
           c2019.state_fips,
           c2019.county_fips,
           c2019.pop_est_2019 AS pop_2019,
           c2010.estimates_base_2010 AS pop_2010,
           round((c2019.pop_est_2019::numeric -
               c2010.estimates_base_2010) /
               c2010.estimates_base_2010 * 100, 1)
               AS pct_change_2019_2010
    FROM us_counties_pop_est_2019 AS c2019
    JOIN us_counties_pop_est_2010 AS c2010
        ON c2019.state_fips = c2010.state_fips
            AND c2019.county_fips = c2010.county_fips
);
```

We start the view definition with `CREATE OR REPLACE VIEW`, followed by the name of the view & `AS`. The `SELECT` query names columns from the census tables & includes a column definition with a percent change calculation. Then we join the 2019 & 2010 census tables using the state & county FIPS codes. Run the query & the database should again respond with `CREATE VIEW`.

Now that we've created the view, we can use the query below to run a simple query using the new view that retrieves data for Nevada counties.

```
SELECT county_name,
       state_name,
       pop_2019,
       pct_change_2019_2010
FROM county_pop_change_2019_2010
WHERE state_name = 'Nevada'
ORDER BY county_fips
LIMIT 5;
```

In the query, we specify four of the `county_pop_change_2019_2010` view's seven columns. One is `pct_change_2019_2010`, which returns the result of the percent change calculation we're looking for. We're also filtering the results using a `WHERE` clause, similar to how we'd filter any query.

After querying the four columns from the view, the results should look like this:

<img src = "Selecting Columns from the county_pop_change_2019_2010 View.png" width = "600" style = "margin:auto"/>

Now we can revisit this view as often as we like to pull data for presentations or to answer questions about the percent change in population for any county in America from 2010 to 2019.

Looking at just these five rows, you can see a couple of interesting stories emerge: the continued rapid growth of Clark County, which includes the city of Las Vegas, as well as a strong percent increase in Esmeralda County, one of the smallest counties in the United States & home to several ghost towns.

## Creating & Refreshing a Materialised View

A materialised view differs from a standard view in that upon its creation, the materialised view's stored query is executed, & the results it generates are saved in the database. In effect, this creates a new table. The view retains its stored query, so you can update the saved data by issuing a command to refresh the view. A good use for materialised views is to preprocess complex queries that take a while to run & make those results available for faster querying.

Let's drop the `nevada_counties_pop_2019` view & re-create it as a materialised view using the code in the query below:

```
DROP VIEW nevada_counties_pop_2019;

CREATE MATERIALIZED VIEW nevada_counties_pop_2019
AS (
    SELECT county_name,
           state_fips,
           county_fips,
           pop_est_2019
    FROM us_counties_pop_est_2019
    WHERE state_name = 'Nevada';
)
```

First, we use a `DROP VIEW` statement to remove the `nevada_counties_pop_2019` view from the database. Then, we run `CREATE MATERIALIZED VIEW` to make the view. Notice that the syntax is the same as the one for making a standard view, except for the added `MATERIALIZED` keyword & the omission of `OR REPLACE`, which is not available in the materialised view syntax. After running the statement, the database should respond with the message `SELECT 17`, telling you that the view's query produced 17 rows to be stored in the view. We can noy query this data as with a standard view.

<img src = "Creating a Materialised View.png" width = "600" style = "margin:auto"/>

Let's say that the population estimates stored in `us_counties_pop_est_2019` are revised. To update the data stored in the materialised view, we can use the `REFRESH` keyword, as in the query below:

```
REFRESH MATERIALIZED VIEW nevada_counties_pop_2019;
```

Executing this statement reruns the query stored in the `nevada_counties_pop_2019` view; the server will respond with the message `REFRESH MATERIALIZED VIEW`. The view will not reflect any updates to the data referenced by the view's query. When you have a query that takes some time to run, you can save time by storing its results in a materialised view that's refreshed periodically, letting users quickly access the stored data rather than run a length query.

<img src = "Refreshing a Materialised View.png" width = "600" style = "margin:auto"/>

To delete a materialised view, we use a `DROP MATERIALIZED VIEW` statement. Also note that materialised views appear in a different part of pgAdmin's object browser, under **Schemas -> public -> Matieralized Views**.

<img src = "Visualise Materialized Views.png" width = "600" style = "margin:auto"/>

## Inserting, Updating, & Deleting Data Using a View

With nonmaterialised views, you can update or insert data in the underlying table being queries as long as the view meets certain conditions. One requirement is that the view meets certain conditions. One requirement is that the view must reference a single table or updating view. If the view's query joins tables, as with the population change view we just built in the previous section, you can't perform inserts or updates to the original table directly. Also, the view's query can't contain `DISTINCT`, `WITH`, `GROUP BY` or other clauses. For a complete list of restrictions, see [here](https://www.postgresql.org/docs/current/sql-createview.html).

We already know how to directly insert & update data on a table, so why do it through a view? One reason is that a view is one way you can exercise control over which data a user can update.

### Creating a View of Employees

In a previous lesson, we create & filled the `departments` & `employees` tables with four rows about people & where they work. Running a quick `SELECT * FROM employees ORDER BY emp_id;` query shows the table's contents:

<img src = "Viewing the employees Table.png" width = "600" style = "margin:auto"/>

Let's say we want to use a view to give users in the Tax Department (its `dept_id` is `1`) the ability to add, remove, or update their employees' names without letting them change salary information or the data of employees in another department. To do this, we set up a view with the query below:

```
CREATE OR REPLACE VIEW employees_tax_dept
WITH (security_barrier)
AS (
    SELECT emp_id,
           first_name,
           last_name,
           dept_id
    FROM employees
    WHERE dept_id = 1
)
WITH LOCAL CHECK OPTION;
```

This view is similar to others we've created so far, but with a few additions. First, in the `CREATE OR REPLACE VIEW` statement, we add the keywords `WITH (security_barrier)`. This enables a level of database security to prevent a malicious user from getting around restrictions that the view places on rows & columns. See [here](https://www.postgresql.org/docs/current/rules-privileges.html) for how someone might subvert a view if you omit this type of security.

In the view's `SELECT` query, we pick the columns we want to show from the `employees` table & use `WHERE` to filter the results on `dept_id = 1` to list only Tax Department staff. The view itself will restrict updates or deletes to rows matching the condition in the `WHERE` clause. Adding the keywords `WITH LOCAL CHECK OPTION` restricts inserts as well, allowing users to add new Tax Department employees only (if the view definition omitted these keywords, you could use it to insert a row with a `dept_id` of `3`, for example, which wouldn't be part of the Tax Department). The `LOCAL CHECK OPTION` also prevents a user from changing an employee's `dept_id` to a value other than `1`.

Create the employees_tax_dept_view by running the query. Then run `SELECT * FROM employees_tax_dept ORDER BY emp_id;`, which should provide these two rows:

<img src = "Viewing the employees_tax_dept View.png" width = "600" style = "margin:auto"/>

The result shows the employees who work in the Tax Department; they're two of the four rows in the entire `employees` table.

### Inserting Rows Using the employees_tax_dept View

We can use a view to insert or update data, but instead of using the table name in the `INSERT` or `UPDATE` statement, we substitute the view name. After we add or change data using a view, the change is applied to the underlying table, which in this case is `employees`. The view then reflects the change via the query it runs.

The query below shows two examples that attempt to add new employee records via the `employees_tax_dept` view. The first succeeds, but the second fails:

```
INSERT INTO employees_tax_dept (
    emp_id, first_name, last_name, dept_id
)
VALUES (5, 'Suzanne', 'Legere', 1);

INSERT INTO employees_tax_dept (
    emp_id, first_name, last_name, dept_id
)
VALUES (6, 'Jamil', 'White', 2);

SELECT * FROM employees_tax_dept ORDER BY emp_id;

SELECT * FROM employees ORDER BY emp_id;
```

In the first `INSERT`, we supply the first & last names of Suzanne Legere plus her `emp_id` & `dept_id`. Because the new row will satisfy the `LOCAL CHECK` in the view -- it contains the same columsn & `dept_id` is `1` -- the insert succeeds when it executes:

<img src = "Successful & Rejected Inserts via the employees_tax_dept View 1.png" width = "600" style = "margin:auto"/>

But when we run the query `INSERT` to add an employee named Jamil White using a `dept_id` of `2`, the operation fails with the error message `new row violates check option for view "employees_tax_dept"`. The reason is that when we created the view, we used a `WHERE` clause to return only rows with `dept_id = 1`. The `dept_id` of `2` doesn't pass the `LOCAL CHECK`, so it's prevented from being inserted.

<img src = "Successful & Rejected Inserts via the employees_tax_dept View 2.png" width = "600" style = "margin:auto"/>

If we run the `SELECT` statement on the `employees_tax_dept` view, we'll see that Suzanna Legere was successfully added:

<img src = "Successful & Rejected Inserts via the employees_tax_dept View 3.png" width = "600" style = "margin:auto"/>

We also query the `employees` table to see that, in fact, Suzanna Legere was added to the full table. The view queries the `employees` table each time we access it.

<img src = "Successful & Rejected Inserts via the employees_tax_dept View 4.png" width = "600" style = "margin:auto"/>

As you can see from the addition of Suzanne Legere, the data we add using a view is also added to the underlying table. However, because the view doesn't include the `salary` column, the value in her row is `NULL`. If you attempt to insert a salary value using the view, you would receive the error message `column "salary" of relation "employees_tax_dept" does not exist`. The reason is that even though the `salary` column exists in the underlying `employees` table, it's not referenced in the view. Again, this is one way to limit access to sensitive data. We can also add `WITH (security_barrier)` if we plan to take on database administrator responsibilities.

### Updating Rows Using the employees_tax_dept View

The same restrictions on access data in an underlying table apply when we update data using the `employees_tax_dept` view. The query below shows a standard query to change the spelling of Suzanne's last name using `UPDATE`.

```
UPDATE employees_tax_dept
SET last_name = 'Le Gere'
WHERE emp_id = 5;

SELECT * FROM employees_tax_dept ORDER BY emp_id;
```

Run the query, & the result from the `SELECT` query should showthe updated last name, which occurs in the underlying `employees` table:

<img src = "Updating a Row via the employees_tax_dept View.png" width = "600" style = "margin:auto"/>

Suzanne's last name is now correctly spelled as Le Gere, not Legere.

However, if we try to update the name of an employee who's not in the Tax Department, the query fails just as it did when we tried to insert Jamil White into the view. Try to use this view to update the salary of an employee -- even one in the Tax Department -- will also fail. If the view doesn't reference a column in the underlying table, you can't access that column through the view. Again, the fact that updates on views are restricted in this way offers ways to secure & hide certain pieces of data.

### Deleting Rows Using the employees_tax_dept View

Now let's explore how to delete rows using a view. The restrictions on which data you can affect apply here as well. For example, if Suzanna Le Gere gets a better offer from another firm & decides to leave, you can remove her from `employees` through the `employees_tax_dept` view. Here is a query with the standard `DELETE` syntax:

```
DELETE FROM employees_tax_dept
WHERE emp_id = 5;
```

Run the query & PostgreSQL should respond with `DELETE 1`.

<img src = "Deleting a Row via the employees_tax_dept View.png" width = "600" style = "margin:auto"/> 

However, when you try to delete a row for an employee in a department other than the Tax Department, PostgreSQL won't allow it & will report `DELETE 0`.

In summary, views not only give you control over access to data, but also give you shortcuts for working with data.

---

# Creating Your Own Functions & Procedures

You've used function throughout the course, such as to capitalise letters with `upper()` or add numbers with `sum()`. Behind these function is a significant amount of (sometimes complex) programming that executes a series of actions & may, depending on the job of the function, return a response. We'll build some basic function what we can use as a launchpad for more complex function. Even simple functions can help avoid repeating code.

Much of the syntax in this section is specific to PostgreSQL, which supports both user-defined functions & *procedures* (the difference between the two is subtle). You can define function & procedures using plain SQL, but you can choose from other options. One is a PostgreSQL-specific *procedural language* called PL/pgSQL that adds features not found in standard SQL, such as logical control structures (`IF ... THEN ... ELSE`). Other option include PL/Python & PL/R for the Python & R programming languages.

## Creating th percent_change() Function

A function processes data & returns a value. As an example, let's write a function to simplify a staple of data analysis: calculating the percent change between two values. In a previous lesson, we learned that we express the percent change formula this way:

```
percent change = (New Number - Old Number) / Old Number
```

Rather than writing the formula each time we need it, we can create a function called `percent_change()` that takes the new & old numbers as inputs & returns the result rounded to a user-specified number of decimal places. Let's walk through the code below to see how to declare a simple function that uses SQL:

```
CREATE OR REPLACE FUNCTION
percent_change(new_value numeric,
               old_value numeric,
               decimal_places integer DEFAULT 1)
RETURNS numeric AS
'SELECT round(((new_value - old_value) / old_value) * 100,
     decimal_places);'
LANGUAGE SQL
IMMUTABLE
RETURNS NULL ON NULL INPUT;
```

A lot is happening in this code, but it's not as complicated as it looks. We start with the commad `CREATE OR REPLACE FUNCTION`. As with the syntax to create a view, the `OR REPLACE` keywrods are optional. We then give the name of the function &, in parentheses, a list of *arguments* that determine the function's inputs. Each argument will serve as an input to the function & gets a name & data type. For example, `new_value` & `old_value` are `numeric` & require that the user of the function supply input values matching that type, whereas `decimal_places` (which specifies the number of places to round results) is `integer`. For `decimal_places`, we specify `1` as the `DEFAULT` value -- this makes the argument optional &, if it's ommitted by the user, will set the argument to `1` by default.

We then use the keywords `RETURNS numeric AS` to tell the function to return its calculation as type `numeric`. If this were a function to concatenate strings, we might return `text`.

Next, we write the meat of the funciton that performs the calculation. Inside single quotes, we place a `SELECT` query that includes the percent change calculation nested inside a `round()` function. In the formula, we use the function's argument names instead of numbers.

We then supply a series of keywords that define the function's attributes & behaviour. The `LANGUAGE` keyword specifies that we've written this function using plain SQL as opposed to one of the other languages PostgreSQL supports for creating functions. Next, the `IMMUTABLE` keyword indicates that the function cannot modify the database & will always return the same result for a given set of arguments. The line `RETURNS NULL ON NULL INPUT` guarantees that the function will supply a `NULL` response if any input that is not supplied by default is a `NULL`.

Run the code using pgAdmin to create the `percent_change()` function. The serve should respond with the message `CREATE FUNCTION`.

<img src = "Creating a percent_change() Function.png" width = "600" style = "margin:auto"/>

## Using the percent_change() Function

To test the new `percent_change()` function, run it by itself using `SELECT`, as shown in the query below:

```
SELECT percent_change(110, 108, 2);
```

This example uses a value of `110` for the new number, `108` for the old number, & `2` as the desired number of decimal places to round the result.

The result should look like this:

<img src = "percent_change() Function Test.png" width = "600" style = "margin:auto"/>

The result tells us that there's a 1.85 percent increase between 108 & 110. Experiment with other numbers to see how the results change. Also, try changing the `decimal_places` argument to values including `0`, or omit it, to see how that affects the output. You should see results that have more or fewer numbers after the defimal point, based on your input.

We created this function to avoid writing the full percent change formula in queries. Let's use it to calculate percent change using a version of the census estimates population change query we wrote in a previous lesson, as shown below:

```
SELECT c2019.county_name,
       c2019.state_name,
       c2019.pop_est_2019 AS pop_2019,
       percent_change(c2019.pop_est_2019,
                      c2010.estimates_base_2010)
           AS pct_chg_func,
       round((c2019.pop_est_2019::numeric -
           c2010.estimates_base_2010) /
           c2010.estimates_base_2010 * 100, 1)
           AS pct_change_formula
FROM us_counties_pop_est_2019 AS c2019
JOIN us_counties_pop_est_2010 AS c2010
    ON c2019.state_fips = c2010.state_fips
        AND c2019.county_fips = c2010.county_fips
ORDER BY pct_chg_func DESC
LIMIT 5;
```

The query above is a modified version of the old query -- we add the `percent_change()` function as a column in `SELECT`. We also include the explicit percent change formula so we can compare results. As inputs, we use the 2019 population estimate column (`c2019.pop_est_2019`) as the new number & the 2010 estimates base as the old (`c2010.estimates.base_2010`).

The query results should display the five counties with the greatest percent change in population, & the results from the function should match the results from the formula entered directly into the query. Note that each value in the `pct_change_func` column has one decimal place, the function's default value, because we didn't provide the optional argument. Here's teh result with both the function & the formula.

<img src = "Testing percent_change() on Census Data.png" width = "600" style = "margin:auto"/>

Now that we know the function works as intended, we can use `percent_change()` any time we need to solve that calculation -- that's faster than writing out the formula.

## Updating Data with a Procedure

As implemented in PostgreSQL, a *procedure* is a close relative of a function, albeit with some significant differences. Both procedures & functions can perform data operations that don't return a value, such as an update. Procedures, on the other hand, don't have a clause to return a value, while functions do. Also, procedures can incorporate the transaction commands such as `COMMIT` & `ROLLBACK`, & functions cannot. Many database managers implement procedures, which are sometimes referred to as *stored procedures*. PostgreSQL added procedures as of version 11 & are part of the SQL standard, though PostgreSQL syntax is not fully compatible.

We can simplify routine updates to data using procedures. In this section, we'll write a procedure that updates a record of the correct number of personal days off a teacher gets (in addition to vacation days) based on the time elapsed since their hire date.

For this exercise, we'll return to the `teachers` table. Let's add a column to `teachers` to hold the teachers' personal days using the code below. The new column will be empty until we fill it later using a procedure.

```
ALTER TABLE teachers
ADD COLUMN personal_days integer;

SELECT first_name,
       last_name,
       hire_date,
       personal_days
FROM teachers;
```

The query updates the teachers table using `ALTER` & adds the `personal_days` column using the keywords `ADD COLUMN`. We then run the `SELECT` statement to view the data, in which we also include the names & hire dates of each teacher. When both queries finish, you should see the following six rows:

<img src = "Adding a Column to the teachers Table & Seeing the Data.png" width = "600" style = "margin:auto"/>

The `personal_days` column contains only `NULL` values because we haven't inserted anything yet.

Now, lets create a procedure called `update_personal_days()` that populates the `personal_days` column with their earned personal days (in additiona to vacation days). We'll use the following criteria:

Less than 10 years since hire: 3 personal days

10 to less than 15 years since hire: 4 personal days

15 to less than 20 years since hire: 5 personal days

20 years to less than 25 years since hire: 6 personal days

25 years or more since hire: 7 personal days

The query creates a procedure. This time, instead of using plain SQL, we'll incorporate elements of the PL/pgSQL procedural language, which is an additional language PostgreSQL supports for writing functions. Let's walk through some differences.

```
CREATE OR REPLACE PROCEDURE update_personal_days()
AS $$
BEGIN
    UPDATE teachers
    SET personal_days =
        CASE WHEN (now() - hire_date) >=
                  '10 years'::interval AND (now() -
                  hire_date) < '15 years'::interval THEN 4
             WHEN (now() - hire_date) >=
                  '15 years'::interval AND (now() -
                  hire_date) < '20 years'::interval THEN 5
             WHEN (now() - hire_date) >=
                  '20 years'::interval AND (now() -
                  hire_date) < '25 years'::interval THEN 6
             WHEN (now() - hire_date) >=
                  '25 years'::interval THEN 7
             ELSE 3
        END;
    RAISE NOTICE 'personal_days updated!';
END;
$$
LANGUAGE plpgsql;
```

We begin with `CREATE OR REPLACE PROCEDURE` & give the procedure a name. This time, we provide no arguments because no user input is required -- the procedure operates on predetermined columns with set values for calculating intervals.

Often when writing PL/pgSQL-based functions, the PostgreSQL convention is to use the non-ANSI SQL standard dollar-quote (`$$`) to mark the start & end of the string that contains all the function's commands. (As with the `percent_change()` SQL function earlier, you could use single quote marks to enclose the string, but then any single quotes in the string would need to be doubled, & that not only looks messy but can be confusing.) So, everything between the pair of `$$` is the code that does the work. You can also add some text between the dollar signs, like `$namestring$`, to create a unique pair of beginning & ending quotes. This is useful, for example, if you need to quote a query inside the function.

Right after the first `$$` we start a `BEGIN ... END;` block. This is a PL/pgSQL convention that delineates the start & end of a section of code one `BEGIN ... END;` inside another to facilitate logical groupings of code. Inside that block, we place an `UPDATE` statement that uses a `CASE` statement to determine the number of days each teacher gets. We subtract the `hire_date` from the current date, which is retrieved from the server by the `now()` function. Depending on which range `now() - hire_date` falls into, the `CASE` statement returns the number of personal days corresponding to the range. We use the PL/pgSQL keywords `RAISE NOTICE` to dispay a message that the procedure is done. Finally, we use the `LANGUAGE` keyword so the database knows to interpret what we've written according to the syntax specific to PL/pgSQL.

Run the code below to create the `update_personal_days()` procedure. To invoke the procedure, we use the `CALL` command, which is part of the ANSI SQL standard:

```
CALL update_personal_days();
```

When the procedure runs, the server responds with the notice it raises, which is `personal_days_updated!`.

<img src = "Running the update_personal_days() Procedure.png" width = "600" style = "margin:auto"/>

When we rerun the `SELECT` statement on our `teachers` table, we should see that each row of the `personal_days` column is filled with the appropriate values. Note that results will vary depending on when you run this function, because calculations using `now()` change as time passes.

<img src = "Checking Results of update_personal_days() Procedure.png" width = "600" style = "margin:auto"/>

You could use the `update_personal_days()` function to regularly update data manually after performing certain tasks, or you could use a task scheduler such as pgAgent (a separate open source tool) to run it automatically.

## Using the Python Language in a Function

Previously, we mentioned that PL/pgSQL is the default procedural language within PostgreSQL, but the database also supports creating functions using open source languages, such as Python & R. This support allows us to take advantage of features & modules from those languages within functions we create. For example, we can use the `pandas` library for analysis. The documentation here provides a comprehensive review of the languages included with PostgreSQL, but a comprehensive review of the languages included with PostgreSQL, but here we'll show a simple function using Python.

To enable PL/Python, you must create the extension using the below code:

```
CREATE EXTENSION plpython3u;
```

<img src = "Enabling the PL:Python Procedural Language.png" width = "600" style = "margin:auto"/>

If you get an error, such as `image not found`, that means the PL/Python extension is not installed on your system. Depending on the operating system, installation of PL/Python typically requires installation of Python & additional configuration beyond the basic PostgreSQL install.

After enabling the extension, we can create a function using syntax similar to the examples you've tried so far, but using Python for the body of the function. The query below shows how to use PL/Python to create a function called `trim_county()` that removesthe word *County* from the end of a string. We'll use this function to clean up names of counties in the census data.

```
CREATE OR REPLACE FUNCTION trim_county(input_string text)
RETURNS text AS
$$
    import re
    cleaned = re.sub(r' County', '', input_string)
    return cleaned
$$
LANGUAGE plpython3u;
```

The structure should look familiar. After naming the function & its text input, we use the `RETURNS` keyword to specify that the function will send text back. After the opening `$$` quotes, we get straight to the Python code, starting with a statement to import the Python regular expressions module, `re`. Even if you don't know much about Python, you can probably deduce that the next two lines of code set a variable called `cleaned` to the results of a Python regular expression function called `sub()`. That function looks for a space followed by the word *County* in the `input_string` passed into the function & substitutes an empty string, which is denoted by two apostrophes. Then the function returns the content of the variable `cleaned`. To end, we specify `LANGUAGE plpython3u` to note we're writing the function with PL/Python.

<img src = "Using PLPython to Create the `trim_county()` Function.png" width = "600" style = "margin:auto"/>

Run the code to create the function, & then execute the `SELECT` statement in the query below to see it in action.

```
SELECT county_name,
       trim_county(county_name)
FROM us_counties_pop_est_2019
ORDER BY state_fips, county_fips
LIMIT 5;
```

We use the `county_name` column in the `us_counties_pop_est_2019` table as input to `trim_county()`. That should return these results:

<img src = "Testing the trim_county()` Function.png" width = "600" style = "margin:auto"/>

As you can see, the `trim_county()` function evaluated each value in the `county_name` column & removed a space & the word *County* when present. Although this is a trivial example, it shows how easy it is to use Python -- or one of the other supported procedural languages -- inside a function.

---

# Automating Database Actions with Triggers

A database *trigger* executes a function whenever a specified event, such as an `INSERT`, `UPDATE`, or `DELETE`, occures on a table or view. You can set a trigger to fire before, after, or instead of the event, & you can also set it to fire once for each row affected by the event or just once per operation. For example, let's say you delete 20 rows from the table. You could set the trigger to fire once for each of the 20 rows deleted or just one time.

We'll work through two examples. The first example keeps a log of changes made to grades at a school. The second automatically classifies temperatures each time we collect a reading.

## Loggin Grade Updates to a Table

Let's say we want to automatically track changes made to a student `grades` table in our school's database. Every time a row is updated, we want to record the old & new grade plus the time the change occurred. To handle this task automatically, we'll need three items:

A `grades_history` table to record the changes to grades in a `grades` table,

A trigger to run a function every time a change occurs in the `grades` table, which we'll name `grades_update`,

The function the trigger will execute, which we'll call `record_if_grade_changed()`.

### Creating Tables to Track Grades & Updates

Let's start by making the tables we need. The query below includes the code to first create & fill `grades` & then create `grades_history`.

```
CREATE TABLE grades (
    student_id bigint,
    course_id bigint,
    course text NOT NULL,
    grade text NOT NULL,
    PRIMARY KEY (student_id, course_id)
);

INSERT INTO grades
VALUES (1, 1, 'Biology 2', 'F'),
       (1, 2, 'English 11B', 'D'),
       (1, 3, 'World History 11B', 'C'),
       (1, 4, 'Trig 2', 'B');

CREATE TABLE grades_history (
    student_id bigint NOT NULL,
    course_id bigint NOT NULL,
    change_time timestamp with time zone NOT NULL,
    course text NOT NULL,
    old_grade text NOT NULL,
    new_grade text NOT NULL,
    PRIMARY KEY (student_id, course_id, change_time)
);
```

These commands are straightforward. We use `CREATE` to make a `grades` table & add four rows using `INSERT`, where each row represents a student's grade in a class. Then we use `CREATE TABLE` to make the `grades_history` table to hold the data we log each time an existing grade is altered. The `grades_history` table has columns for thenew grade, old grade, & the time of the change. Run the code to create the tables & fill the `grades` table. We insert no data into `grades_history` here because the trigger process will handle that task.

### Creating the Function & Trigger

Next, let's write the `record_if_grade_changed()` function that the trigger will execute (note that the PostgreSQL documentation refers to such functions as *trigger procedures*). We must write the function before naming it in the trigger. Let's go through the query below:

```
CREATE OR REPLACE FUNCTION record_if_grade_changed()
RETURNS trigger AS
$$
BEGIN
    IF NEW.grade <> OLD.grade THEN
    INSERT INTO grades_history (
        student_id,
        course_id,
        change_time,
        course,
        old_grade,
        new_grade
    )
    VALUES (OLD.student_id,
            OLD.course_id,
            now(),
            OLD.course,
            OLD.grade,
            NEW.grade);
    END IF;
    RETURN NULL;
END;
$$
LANGUAGE plpgsql;
```

The `record_if_grade_changed()` function follows the pattern of earlier examples but with differences specific to working with triggers. First, we specify `RETURNS trigger` instead of a data type. We use dollar-quotes to delineate the code portion of the function, & because `record_if_grade_changed()` is a PL/pgSQL function, we also place the code to execute inside a `BEGIN ... END;` block. Next, we start the procedure using an `IF ... THEN` statement, which is one of the control structures PL/pgSQL provides. We use it here to run the `INSERT` statement only if the updated grade is different from the old grade, which we check using the `<>` operator.

When a change occurs to the `grades` table, the trigger (which we'll create next) will execute. For each row that's changed, the trigger will pass two collections of data into `record_if_grade_changed()`. The first is the row values *before* they were changed, noted with the prefix `OLD`. The second is the row values *after* they were changed, noted with the prefix `NEW`. The function can access the original row values & the updated row values, which it will use for a comparison. If the `IF ... THEN` statement evaluates as `true`, indicating that the old & new `grade` values are different, we use `INSERT` to add a row to `grades_history` that contains both `OLD.grade` & `NEW.grade`. Finally, we include a `RETURN` statement with a value of `NULL`, the trigger procedure performs a database `INSERT`, so we don't need a value returned.

Run the code to create the function. Then, we can add a `grades_update` trigger to the `grades` table with the code below:

```
CREATE TRIGGER grades_update
AFTER UPDATE ON grades
FOR EACH ROW
EXECUTE PROCEDURE record_if_grade_changed();
```

In PostgreSQL, the syntax for creating a trigger follows the ANSI SQL standard (although not all aspects of the standard are supported, per the [documentation](https://www.postgresql.org/docs/current/sql-createtrigger.html). The code begins with a `CREATE TRIGGER` statement, followed by clauses that control when the trigger runs & how it behaves. We use `AFTER UPDATE` to specify that we want the trigger to fire after the update occurs on the `grades` row. We could also use the `BEFORE` or `INSTEAD OF` keywords depending on the need.

We write `FOR EACH ROW` to tell the trigger to execute the procedure once for each row updated in the table. For example, if someone runs an update that affects three rows, the procedure will run three times. The alternate (& default) is `FOR EACH STATEMENT`, which runs the procedure once. If we didn't care about capturing changes to each row & simply wanted to record that grades were changed at a certain time, we could use that option. Finally, we use `EXECUTE PROCEDURE` to name `record_if_grade_changed()` as the function the trigger should run.

Create the trigger by running the code in pgAdmin. The database should respond with the message `CREATE TRIGGER`.

<img src = "Creating the grades_update Trigger.png" width = "600" style = "margin:auto"/>

### Testing the Trigger

Now that we've created the trigger & the function, it should run when data in the `grades` table changes; let's see what the process does. First, let's check the current status of our data. When you run `SELECT * FROM grades_history;`, you'll see that the table is empty because we haven't made any changes to the `grades` table yet & there is nothing to track. Next, when you run `SELECT * FROM grades ORDER BY student_id, course_id;` you should see the grade data that you inserted, as shown here:

<img src = "Viewing the grades Table.png" width = "600" style = "margin:auto"/>

That Biology 2 grade doesn't look very good. Let's update it using the code below:

```
UPDATE grades
SET grade = 'C'
WHERE student_id = 1 AND course_id = 1;
```

When you run the `UPDATE`, pgAdmin doesn't display anything to let you know that the trigger executed in the background. It just reports `UPDATE 1`, meaning a row was updated. 

<img src = "Updating the grades Table.png" width = "600" style = "margin:auto"/>

But our trigger did run, which we can confirm by examining columns in `grades_history` using this `SELECT` query:

```
SELECT student_id,
       change_time,
       course,
       old_grade,
       new_grade
FROM grades_history;
```

When you run this query, you should see that the `grades_history` table, which contains all changes to grades, now has one row:

<img src = "Viewing the grades_history Table.png" width = "600" style = "margin:auto"/>

This row displays the old Biology 2 grade of `F`, the new value `C`, & `change_time`, showing the time of update (your result should reflect your date & time). Note that the addition of this row to `grades_history` happened in the background without the knowledge of the person making the update. But the `UPDATE` event on the table caused the trigger to fire, which executed the `record_if_grade_changed()` function. Now we have a record of the change -- that's pretty cool. The ability to trigger actions on a database automatically gives us more control over our data.

## Automatically Classifying Temperatures

We can use the SQL `CASE` statement to reclassify temperature readings into descriptive categories. The `CASE` statement is also part of the PL/pgSQL procedural language, & we can use its capability to assign values to variables to automatically store those category names in a table each time we add a temperature reading. If we're routinely collecting temperature readings, using this technique to automate the classification spares us from having to handle the task manually.

We'll follow the same steps we used for logging grade changes: we first create a function to classify the temperatures & then create a trigger to run the funciton each time the table is updated. Use the query below to create a `temperature_test` table for this exercise:

```
CREATE TABLE temperature_test (
    station_name text,
    observation_date date,
    max_temp integer,
    min_temp integer,
    max_temp_group text,
    PRIMARY KEY (station_name, observation_date)
);
```

The `temperature_test` table contains columns to hold the name of the station & date of the temperature observation. Let's imagine that we have some process to insert a row once a day that provides the maximum & minimum temperature for that location, & we need to fill the `max_temp_group` column with a descriptive classification of the day's high reading to provide text to a weather forecast we're distributing.

To do this, we first make a function called `classify_max_temp()`, as shown in the query below:

```
CREATE OR REPLACE FUNCTION classify_max_temp()
RETURNS trigger AS
$$
BEGIN
    CASE WHEN NEW.max_temp >= 90 THEN
             NEW.max_temp_group := 'Hot';
         WHEN NEW.max_temp >= 70 AND NEW.max_temp < 90
             THEN NEW.max_temp_group := 'Warm';
         WHEN NEW.max_temp >= 50 AND NEW.max_temp < 70
             THEN NEW.max_temp_group := 'Pleasant';
         WHEN NEW.max_temp >= 33 AND NEW.max_temp < 50
             THEN NEW.max_temp_group := 'Cold';
         WHEN NEW.max_temp >= 20 AND NEW.max_temp < 33
             THEN NEW.max_temp_group := 'Frigid';
         WHEN NEW.max_temp < 20 THEN
             NEW.max_temp_group := 'Inhumane';
         ELSE NEW.max_temp_group := 'No Reading';
    END CASE;
    RETURN NEW;
END;
$$
LANGUAGE plpgsql;
```

By now, these functions should look familiar. What's new here is PL/pgSQL version of the `CASE` syntax, which differs slightly from the SQL syntax. The PL/pgSQL syntax includes a semicolon after each `WHEN ... THEN` clause. Also new is the *assignment operator* `:=`, which we use to assign the descriptive name to the `NEW.max_temp_group` column based on the outcome of the `CASE` function. For example, the statement `NEW.max_temp_group := 'Cold'` assigns the string `'Cold'` to `NEW.max_temp_group` when the temperature value is greater than or equal to 33 degrees but less than 50 degrees Fahrenheit. When the function returns the `NEW` row to be inserted in the table, it will include the string value `Cold`. Run the code to create the function.

Next, using the code below, we can create a trigger to execute the function each time a row is added to `temperature_test`.

```
CREATE TRIGGER temperature_insert
BEFORE INSERT ON temperature_test
FOR EACH ROW
EXECUTE PROCEDURE classify_max_temp();
```

In this example, we classify `max_temp` & create a value for `max_temp_group` prior to inserting the row into the table. Doing so is more efficient than performing a separate update after the row is inserted. To specify that behaviour, we set the `temperature_insert` trigger to fire `BEFORE INSERT`.

We also want the trigger to fire `FOR EACH ROW` because we want each `max_temp` recorded in the table to get a decriptive classification. The final `EXECUTE PROCEDURE` statement names the `classify_max_temp()` function we just created. Run the `CREATE TRIGGER` statement in pgAdmin, & then test the setup using the query below:

```
INSERT INTO temperature_test
VALUES ('North Station', '1/19/2023', 10, -3),
       ('North Station', '3/20/2023', 28, 19),
       ('North Station', '5/2/2023', 65, 42),
       ('North Station', '8/9/2023', 93, 74),
       ('North Station', '12/14/2023', NULL, NULL);

SELECT * FROM temperature_test
ORDER BY observation_date;
```

Here we insert five rows into `temperature_test`, & we expect the `temperature_insert` trigger to fire for each row -- & it does! The `SELECT` statement in the listing should display these results:

<img src = "Inserting Rows to Test the temperature_insert Trigger.png" width = "600" style = "margin:auto"/>

Thanks to the trigger & function, each `max_temp` inserted automatically receives the appropriate classification in the `max_temp_group` column -- including the instance where we had no reading for that value. Note that the trigger's update of the column will override any user-supplied values during insert.

This temperature example & the earlier grade_change auditing example are rudimentary, but they give you a glimpse of how useful triggers & functions can be in simplifying data maintenance.

---

# Wrapping Up

Although the techniques begin to merge with those of a database administrators, you can apply the concepts to reduce the amount of time you spend repeating certain tasks.