# Table of Contents
 <p><div class="lev1 toc-item"><a href="#PostgreSQL-Functions" data-toc-modified-id="PostgreSQL-Functions-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>PostgreSQL Functions</a></div><div class="lev2 toc-item"><a href="#PostgreSQL-Function-Syntax" data-toc-modified-id="PostgreSQL-Function-Syntax-11"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>PostgreSQL Function Syntax</a></div><div class="lev2 toc-item"><a href="#Very-Simple-Example" data-toc-modified-id="Very-Simple-Example-12"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Very Simple Example</a></div><div class="lev4 toc-item"><a href="#Example-Function" data-toc-modified-id="Example-Function-1201"><span class="toc-item-num">1.2.0.1&nbsp;&nbsp;</span>Example Function</a></div><div class="lev2 toc-item"><a href="#SQL-Function-Example:" data-toc-modified-id="SQL-Function-Example:-13"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>SQL Function Example:</a></div><div class="lev1 toc-item"><a href="#PostgreSQL-Extensions" data-toc-modified-id="PostgreSQL-Extensions-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>PostgreSQL Extensions</a></div>

# PostgreSQL Functions

In PostgreSQL, stored procedures are called _Functions_.
PostgreSQL functions allow you to carry out operations that would normally take several queries and round trips in a single function within the database. 

Functions allow database reuse as other applications can interact directly with your stored procedures instead of a middle-tier or duplicating code.

In PostgreSQL, functions can be created in language of your choice like _SQL_, _PL/pgSQL_, _C_, _Python_, etc.

PostgreSQL is very extensible in this regard.


## PostgreSQL Function Syntax

```SQL
CREATE [OR REPLACE] FUNCTION function_name (arguments) 
RETURNS return_datatype AS $variable_name$
  DECLARE
    declaration;
    [...]
  BEGIN
    < function_body >
    [...]
    RETURN { variable_name | value }
  END; LANGUAGE plpgsql;
```

Where,

  * function-name specifies the name of the function.
  * [OR REPLACE] option allows modifying an existing function.
  * The function must contain a return statement.
  * RETURN clause specifies that data type you are going to return from the function. The return_datatype can be a base, composite, or domain type, or can reference the type of a table column.
  * function-body contains the executable part.
  * The AS keyword is used for creating a standalone function.
  * plpgsql is the name of the language that the function is implemented in. Here, we use this option for PostgreSQL, it Can be SQL, C, internal, or the name of a user-defined procedural language. For backward compatibility, the name can be enclosed by single quotes.


## Very Simple Example

Think back to this practice question from Day 2:

    List each film title and the number of actors in that film.

Which we could have answered with this SQL

```SQL
SELECT film.title, count(*)
FROM film_actor JOIN film USING (film_id)
GROUP BY film.title;
```
So, looking at the first 10 rows of that result in Film Title order:

In [1]:
%load_ext sql
%sql postgres://dsa_ro_user:readonly@pgsql.dsa.lan/dvdrental

'Connected: dsa_ro_user@dvdrental'

In [2]:
%%sql
SELECT film.title, count(*)
FROM film_actor JOIN film USING (film_id)
GROUP BY film.title
ORDER BY film.title
LIMIT 10;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
10 rows affected.


title,count
Academy Dinosaur,10
Ace Goldfinger,4
Adaptation Holes,5
Affair Prejudice,5
African Egg,5
Agent Truman,7
Airplane Sierra,5
Airport Pollock,4
Alabama Devil,9
Aladdin Calendar,8


#### Example Function

```SQL
CREATE OR REPLACE FUNCTION totalActors (vFilmID integer)
RETURNS integer AS $$
declare
total integer;
BEGIN
   SELECT count(*) into total 
   FROM film_actor
   WHERE film_id = $1;
                
   RETURN total;
END;
$$                             
SECURITY DEFINER 
STRICT
LANGUAGE plpgsql;
```

This function is written in the PL/pgSQL language, which is specific to PostgreSQL and very similar to Oracle PL/SQL.
The PL/pgSQL lanuage could be a bootcamp all its own.
We encourage you to explore this capability of the database further after you have developed more indepth understanding and comfort with the basics of databases and SQL.
 * https://www.postgresql.org/docs/9.5/static/plpgsql.html



** Examining the Fuction record: **

```SQL
dvdrental=# \df totalActors
                           List of functions
 Schema |    Name     | Result data type | Argument data types |  Type  
--------+-------------+------------------+---------------------+--------
 public | totalactors | integer          | vfilmid integer     | normal
(1 row)
```

Now, let's rewrite the SQL above to use our function!


In [3]:
%%sql
SELECT title, totalActors(film_id)
FROM film
ORDER BY title
LIMIT 10;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
10 rows affected.


title,totalactors
Academy Dinosaur,10
Ace Goldfinger,4
Adaptation Holes,5
Affair Prejudice,5
African Egg,5
Agent Truman,7
Airplane Sierra,5
Airport Pollock,4
Alabama Devil,9
Aladdin Calendar,8


In this case, we have used this simple function to simplify a query.

However, there is sometimes a trade off!

In [4]:
%%sql
EXPLAIN ANALYZE
SELECT film.title, count(*)
FROM film_actor JOIN film USING (film_id)
GROUP BY film.title
ORDER BY film.title;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
13 rows affected.


QUERY PLAN
Sort (cost=262.65..265.15 rows=1000 width=23) (actual time=11.088..11.164 rows=997 loops=1)
Sort Key: film.title
Sort Method: quicksort Memory: 102kB
-> HashAggregate (cost=202.82..212.82 rows=1000 width=23) (actual time=8.170..8.359 rows=997 loops=1)
Group Key: film.title
-> Hash Join (cost=76.50..175.51 rows=5462 width=15) (actual time=1.324..5.066 rows=5462 loops=1)
Hash Cond: (film_actor.film_id = film.film_id)
-> Seq Scan on film_actor (cost=0.00..84.62 rows=5462 width=2) (actual time=0.048..1.120 rows=5462 loops=1)
-> Hash (cost=64.00..64.00 rows=1000 width=19) (actual time=1.224..1.224 rows=1000 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 60kB


In [5]:
%%sql
EXPLAIN ANALYZE
SELECT title, totalActors(film_id)
FROM film
ORDER BY title

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
3 rows affected.


QUERY PLAN
Index Scan using idx_title on film (cost=0.28..350.92 rows=1000 width=19) (actual time=0.228..26.090 rows=1000 loops=1)
Planning time: 0.152 ms
Execution time: 26.230 ms


We can see that the query planner cannot optimize our full intention with this SQL using the function because it is an atomic operation in the eyes of the planner.
Therefore, it **cannot** use any of its fancy efficiency tricks for multiple table integrations, such as the _Hash Join_.
The effect is that the SQL with the function, which looks simple is actualy ~2.5x slower.

## SQL Function Example:

We can achieve the same effect using an SQL language function.

```SQL
CREATE OR REPLACE FUNCTION totalActors_SQL (vFilmID integer)
RETURNS integer AS $$
   SELECT count(*)::int
   FROM film_actor
   WHERE film_id = $1;
$$ 
SECURITY DEFINER 
STRICT
LANGUAGE SQL;
```

You will notice there are subtle differences in the constructs, such as the missing `BEGIN` and `END`.

```SQL
dvdrental=# \df totalActors_sql
                             List of functions
 Schema |      Name       | Result data type | Argument data types |  Type  
--------+-----------------+------------------+---------------------+--------
 public | totalactors_sql | integer          | vfilmid integer     | normal
(1 row)
```


In [None]:
%%sql
SELECT title, totalActors_sql(film_id)
FROM film
ORDER BY title
LIMIT 10;

In [None]:
%%sql
EXPLAIN ANALYZE
SELECT title, totalActors_sql(film_id)
FROM film
ORDER BY title

What we can see is that conceptually, the SQL function is a parameterized query.
They are similar to the concept of the _prepared statement_ in procedural programming database access.


In PostgreSQL and some other advanced DBMS, functions are the tool in which _Triggers_ are written.
Triggers are coming in the next lesson.


# PostgreSQL Extensions

PostgreSQL is capable of supporting numerous extesions, in part because of the ease with which procedural code from a variety of languages can be added to the database.

For instance, a very popular extension to PostgreSQL is the PostGIS geospatial database extensions.
This extension adds various data types, data, and functions to the database.
Functions are a element of this extension, usually defined as `public.st_*`

Example of PostGIS Query with function:

  * What are the three largest countries by area that have a `FIPS` code starting with `U`, based on border polygons?

This query brings a spatial measurement into the query condition. There are several ways of approaching this problem, but the most efficient is below:


```SQL
SELECT fips, iso3
 , (st_area(the_geom::geography)/(1000^2))::int as square_km
 -- Note, this FROM is using a schema.table name convention
FROM geospatial.country_borders 
WHERE fips LIKE 'U%' 
ORDER BY 2 DESC
LIMIT 3;


 fips | iso3 | square_km 
------+------+-----------
 UZ   | UZB  |    446706
 US   | USA  |   9469924
 UY   | URY  |    177843
(3 rows)


```

**Note:** The above SQL needs to run in the regular `dsa_ro` database, not the dvdrental.