# The Mighty Function

From a software development point of view [functions (sometimes called subroutines)](https://en.wikipedia.org/wiki/Subroutine) are parameterized pieces of code with a single entry point.
Functions usually contain code to accomplish a specific task and because they are parameterized they can be called in different contexts whenever the execution of that task is necessary. By avoiding duplication of code in a program functions significantly reduce the cost of maintenance of programs and make them less bug prone.  Furthermore, functions can be seen as a design tool for programs by hiding implementation details of certain tasks from the user.  This is often called [abstraction](https://en.wikipedia.org/wiki/Abstraction_(computer_science)).

In most programming languages function declaration and calling works analogously to variable declaration and usage in the sense that a function needs to be declared before you are allowed to call it within a program.   Once called the statements in the function body are usually executed in the context of its own local scope often called the *function local scope*. When the function exists that local scope disappears. 

Functions come in two flavors,

* functions that compute return values and return these values to the caller at the call site, and
* functions that do not compute return values but accomplish their tasks via side effects such as writing out a result to the terminal or accessing a database.

Functions that do not compute return values are not allowed to appear in expressions because that would make an expression undefined.  Consider this Python snippet,

In [1]:
def f ():
    print ("Hello World!")
    return

try:
    print (1 + f())
except Exception as e:
    print ("Error: " + str(e))

Hello World!
Error: unsupported operand type(s) for +: 'int' and 'NoneType'


This program fails precisely because the function `f` does not return a value but appears in an expression in the `print` function. Some programming languages such as functional programming languages like [Haskell](https://www.haskell.org) do not permit functions without return values in order to get around this problem of ill defined expressions.

In [2]:
# let the notebook access the code folder
import sys
sys.path.insert(1,"code")

# A Closer Look at Functions


<p align="center">
  <img width="600" height="450" src="figures/chap08/1/figure.jpg">
</p>
<p style="text-align: center;">
Fig. 1: Terminology of functions. 
</p>


Using functions in programs requires two things: a function *declaration* and a *call* site.  In Figure 1 we see a function declaration and a call in a C-like language whose syntax is not unlike the syntax of our Cuppa family of languages.  A function declaration introduces the name of a function, its return type, the kind of arguments it accepts and of course the function body which defines the computation of the function.  In the context of a function declaration the arguments of a function are called *formal arguments* which act as place holders for values the function is going to be called with.  These formal arguments act just like variables in the body of the function. 

Calling a function in most programming languages is simply referring to the name of the function in the program text together with a list of values to be passed to the function.  The values to be passed to the function are called *actual arguments*.

This immediately begs two fundamental questions regarding functions:
* How is the correspondence between actual and formal arguments established?
* How is the value of an actual argument transmitted to a formal argument?


## Argument Correspondence

Most programming languages use *positional arguments* where the first actual
argument is assigned to the first formal argument, the second actual argument is assigned to the second formal argument, and so on.  Python supports this kind of argument correspondence,

In [3]:
def f(first, second, third):
    print ("Value of 'first': {}".format(first))
    print ("Value of 'second': {}".format(second))
    print ("Value of 'third': {}".format(third))
    return

In [4]:
f(1,2,3)

Value of 'first': 1
Value of 'second': 2
Value of 'third': 3


Most modern programming languages also support *keyword arguments* where the names of the formal arguments are initialized in the actual argument list.  Here the formal argument names act like keywords because the order of the arguemnt initialization does not matter.  Turns out Python supports this style of calling a function as well,

In [5]:
f(third=3, second=2, first=1)

Value of 'first': 1
Value of 'second': 2
Value of 'third': 3


Notice that here we reversed the order of the actual parameters but due to the fact that we used the formal parameter names as keywords the order did not matter and we still initialized the formal parameters in the function correctly.

The idea of keyword parameters is interesting in the sense that calls become self-documenting.  Consider this example,

In [6]:
def divide(dividend, divisor): 
   return dividend/divisor

In [7]:
# first call: positional correspondence
print(divide(2,4))

0.5


In [8]:
# second call: keyword correspondence
print(divide(dividend=2, divisor=4))

0.5


Considering that function declarations and calls can be thousands of code lines appart or even in different modules it is clear that using keyword parameters exposes a little bit more about the working of the function reducing the need to constantly refer back to the function declaration when trying to understand how a piece of code works.

## Argument Value Transmission

The second question we had established above is how to connect the value of the actual arguemnt to the formal argument of a function.  That is how do we tranmit an argument value at the call site of a function to the corresponding formal argument of a function. Two of the most popular techniques in use in todays programming languages are the transmission *by value* and the transmission *by reference*.

In by value argument passing a formal argument acts just like a local variable in a function declaration with one important difference: it is initialized using the value of the corresponding actual argument, before the called function begins executing.  This method is sometimes also *copy in* transmission because the value of the actual argument is copied into the function and is used to initialize the formal argument.  Because of its simplicity it is a widely used technique and, as a matter of fact, it is the only argument value transmission technique used both in C and Java.

In passing arguments by reference, the memory address or reference of the actual argument is computed before the called function executes.  Inside the called function, that memory address is used as the memory address of the corresponding formal arguemnt.  In effect, the formal argument is now an alias for the actual argument.  That is, the formal argument accesses the same memory location as the actual argument.  This type of argument passing is very efficient for large objects because it avoids having to copy a large amount of memory into a function in order to initialize a formal argument as would be the case for a by value call. By reference calling is used in Fortran and is also available in C++.

# The Cuppa3 Language

In order to integrate functions into our family of languages we extend our Cuppa2 language (our language with variable declarations and scoping) to include function declarations and calling, and call it Cuppa3.  Here are some examples of what these features look like in Cuppa3,

* The statement `declare inc(x) return x+1;` declares the function `inc` with the single formal argument `x` and a function body `return x+1` that increments the value of `x` by one and returns that value.

* The statement `inc(3);` calls the function `inc` with a single actual argument `3`.  Being a call statment means that the return value of the function is effectively ignored.

* In the expression `4 + inc(1)` we call `inc` with the actual argument `1` and that call returns the value two which is then added to `4` resulting in a value six for the overall expression.

> In Cuppa3 we implement positional parameter correspondence and call by value.

<p align="center">
  <img width="450" src="figures/chap08/2/figure.png">
</p>
<p style="text-align: center;">
Fig. 2: The Cuppa3 grammar.
</p>

Figure 2 shows the full grammar for Cuppa3.
We can identify three statement level additions to the grammar that support functions,

1. Function declarations (line 6): `DECLARE ID '(' opt_formal_args ')' stmt`
1. Function call statements (line 11): `ID '(' opt_actual_args ')' opt_semi`
1. Return statements (line 12): `RETURN opt_exp opt_semi`

Notice that formal and actual arguments are defined as different syntactic classes (lines 20 and 26, respectively).  This is due to the fact that formal arguments can only be variables that according to our call by value convention will be initialized with the values of the actual arguments when the function is called.  The the shape of actual arguemnts on the other hand is not restricted.  For a function call we are free to write things like,
```C
inc(get_value()*2+1)
```
where the actual parameters can be arbitrary complex expressions including other function calls.

At the expression level we find that we have one addition that show the syntax for function calls within expressions (line 49),
```Python
ID '(' opt_actual_args ')'
```
The remainder of the grammar is virtually unchanged compared with the Cuppa2 grammar except for the fact that the above changes bring with them additional rules to deal with optional parameter lists and optional return statement expressions.

We extend the Cuppa2 lexer with the comma literal and the keyword `return` in order to obtain the Cuppa3 lexer. The comma literal is necessary for being able to write formal and actual parameter lists.  We also added the predicate `is_ID` to the lexer.  This function will help us to detect which parsing rule was fired at the statement level in the Cuppa3 front end.

The parser specification [(`cuppa3_gram.py`)](code/cuppa3_gram.py) and the lexer specification [(`cuppa3_lex.py`)](code/cuppa3_lex.py) for Cuppa3 are available in the [`code`](code) folder.  We can test our parser and lexer code to make sure it works as expected,

In [9]:
from cuppa3_lex import lexer
from cuppa3_gram import parser

program =\
'''
declare inc(x) return x+1;
put inc(1);
'''
parser.parse(program, lexer=lexer)

Generating LALR tables


## Cuppa3 Programs

We have seen a simple Cuppa3 program above where we declared the function `inc` and then called it in order to print out an incremented value.  Here are some slightly more complex programs.  The first one defines an `add` function which we then use to add two values together,
```C
declare add(a,b) 
{
    return a+b;
}

declare x = add(3,2);
put x;
```
The next program given an integer `n` will sum all the integer values between 1 and `n`.  What is noteworthy about this program is that Cuppa3 allows us to declare functions within functions,
```C
declare seqsum(n) 
{
    
    declare add(a,b) return a+b;
    declare inc(x) return x+1;
        
    declare i = 1;
    declare sum = 0;
        
    while (i <= n) 
    {
        sum = add(sum,i);
        i = inc(i);
    }
        
    return sum;
}

put seqsum(10);
```
Here the functions `add` and `inc` are only available within the scope of function `seqsum`.
Finally, the last program we look at is the recursive implementation of the factorial of an integer value `x`,
```C
// recursive implementation of factorial
declare fact(x) 
{
     if (x <= 1)
        return 1;
     else 
        return x * fact(x-1);
}

// ask the user for input
declare v;
get v;
put fact(v);
```
The defining characteristic of this program of course is the fact that it is recursive, that is, a function that calls itself.  This program will be a great test case for our Cuppa3 language processors to come.

Fun fact: if we replace the `add` function in the `seqsum` program with a function that implements multiplication,
```C
declare mult(a,b) return a*b;
```
then the `seqsum` program can be seen as the iterative implementation of the factorial computation.

# Function Local Scope

The body of a function executes in its own local scope often referred to as the *function local scope*.  That means formal parameters and any other kind of declarations within the function body are local to the function and therefore only available within the scope of the function.  In terms of our symbol table implementation that means we need to push a scope object on the symbol table stack before we execute or compile a function.

Consider the Cuppa3 program,
```C
declare add(a,b) return a+b;

put add(3,2);
```
Figure 3 shows how the body of the function `add` is executed within its own local scope.  Notice that the formal parameters `a` and `b` are declared as local variables within that local scope and that they are initialized with the values of the actual parameters.

<p align="center">
  <img width="600" height="450" src="figures/chap08/3/figure.jpg">
</p>
<p style="text-align: center;">
Fig. 3: Executing a function body within a function local scope. 
</p>

---
Watch an animation on function calls.

<a href="http://www.youtube.com/watch?feature=player_embedded&v=SCeWFmezs8M" target="_blank">
<img style='border:1px solid #000000' src="movie.jpg" width="120" height="90" />
</a>

---

# Static vs. Dynamic Scoping

There is an interesting interaction between function scopes and variables that are non-local to a function. Here is a  Cuppa3 program where the function `inc` refers to the variable `step` which is a non-local variable,
```C
declare step = 10;

declare inc(x) 
{
     return x+step;
}

// start a local scope
{
     declare step = 2;
     put inc(5);
}
```
The question is: How do we assign a value to `step` within the function call of the `put` statement in the local scope? We have two choices in order to accomplish this:

1. We can interpret the function body in the context of the current symbol table stack.  This would mean that the variable `step` in the function `inc` would be bound to the value 2.  Figure 4 illustrates the configuration of the symbol table stack for this interpretation.  Here the function scope object is just another scope object on the stack.  We refer to this as *dynamic scoping* because the interpretation of the non-local variable in the function depends on the current stack configuration which could be different for every function invocation.

    <p align="center">
      <img width="600" height="450" src="figures/chap08/4/figure.jpg">
    </p>
    <p style="text-align: center;">
    Fig. 4: Symbol table stack configuration for dynamic scoping of a function. 
    </p>
    <br>

1. We can interpret the function body in the context of the symbol table stack configuration when the function was declared.  In this case the function was declared as a global function and therefore when the function is called the function scope itself will refer back to the global scope in order to resolve non-local variable references.  Figure 5 illustrates this. Notice the red arrow indicating that the function scope uses the global scope in order to resolve the non-local variable references.  We call this *static scoping* since the scope object where the function was declared will never change.

    <p align="center">
      <img width="600" height="450" src="figures/chap08/5/figure.jpg">
    </p>
    <p style="text-align: center;">
    Fig. 5: Symbol table stack configuration for static scoping of a function. 
    </p>
    
The majority of our programming languages implement static scoping since function behavior is more predictable by just analyzing the source code.  However, there are language like [Logo](https://en.wikipedia.org/wiki/Logo_(programming_language)) and [Emacs Lisp](https://en.wikipedia.org/wiki/Emacs_Lisp) that do implement dynamic scoping. 

> In Cuppa3 we will implement static scoping.

# A Cuppa3 Interpreter

Our Cuppa3 interpreter follows our language processor architure.  The Cuppa3 front end generates the AST and a tree walker then interprets this AST.
Before we look at the interpreter design itself, the first crucial insight to implementing functions in our interpreter is that function names act just like variable names in that they are the keys into our symbol table as follows,

* During function declaration we enter the function name into the symbol table.
* During a function call we search for the function name in the symbol table.

The second important insight is that the function body can be considered the value that we store with the function name in the symbol table.
During a function call we lookup the function name in the symbol table and return the function body as the value for interpretation.  Figure 6 shows a program that declares a function and a couple of other global variables.  You can see that in all cases the name is used as the key into the symbol table.  The big difference is in the value that is associated with the names.  In the case of the variables `x` and `y` we see that scalar values are being stored and in the case of the function name `inc` we see that the function body is store in the symbol table as the associated value.
In order to make this possible we will extend our symbol table to distinguish between scalar values and function values.

<p align="center">
  <img width="600" src="figures/chap08/6/figure.jpg">
</p>
<p style="text-align: center;">
Fig. 6: Storing functions in the symbol table. 
</p>

---
Watch an animation of our symbol table in action during a function call.
<p>

<a href="http://www.youtube.com/watch?feature=player_embedded&v=oIbbpjohb5A" target="_blank">
<img style='border:1px solid #000000' src="movie.jpg" width="120" height="90" />
</a>

---

## The Symbol Table

As we mentioned above, the symbol table is extended to store two different kinds of objects:

* Scalars 
* Functions

We do this by adding type tags to the objects being stored as values in the symbol table.
We also extended the symbol table so that we can manipulate scopes apropriately in order to implement *static scoping* according to our Cuppa3 design decisions.


<p align="center">
  <img width="450"  src="figures/chap08/7/figure.png">
</p>
<p style="text-align: center;">
Fig. 7: Abbreviated Cuppa3 interpreter symbol table source. 
</p>

## The Frontend

As usual the front end specification consists of a [lexer](code/cuppa3_lex.py), a [parser](code/cuppa3_frontend.py), and a [state object](code/.  Let's take a look at the AST the front end generates for function related program snippets.

In [10]:
from cuppa3_lex import lexer
from cuppa3_frontend import parser

program =\
'''
declare inc(x) return x+1;
put inc(1);
'''
parser.parse(program, lexer=lexer)

## The Tree Walker

# Compiling Cuppa3 Code

Compiling Cuppa3 code, in particular functions, to Exp2bytecode.

In [11]:
from cuppa3_lex import lexer
from cuppa3_cc_frontend import parser
from cuppa3_cc_tree_rewrite import walk as rewrite
from cuppa3_cc_codegen import walk as codegen
from cuppa3_cc_output import output
from cuppa3_cc_state import state
from grammar_stuff import dump_AST

from cuppa3_cc import cc
from exp2bytecode_interp import interp as run

In [12]:
program = \
'''
declare x = (3 + 2) * 4
put x
'''

In [13]:
run(cc(program))

> 20


Step-by-step

In [14]:
state.initialize()
parser.parse(program,lexer=lexer)

In [15]:
dump_AST(state.AST)


(seq 
  |(declare x 
  |  |(* 
  |  |  |(paren 
  |  |  |  |(+ 
  |  |  |  |  |(integer 3) 
  |  |  |  |  |(integer 2))) 
  |  |  |(integer 4))) 
  |(seq 
  |  |(put 
  |  |  |(id x)) 
  |  |(nil)))


In [16]:
state.AST = rewrite(state.AST)
dump_AST(state.AST)


(seq 
  |(assign t$2 
  |  |(* t$1 
  |  |  |(+ t$0 
  |  |  |  |(integer 3) 
  |  |  |  |(integer 2)) 
  |  |  |(integer 4))) 
  |(seq 
  |  |(put 
  |  |  |(id t$2)) 
  |  |(nil)))


In [17]:
output_stream = output(codegen(state.AST))
print(output_stream)

	store t$0 (+ 3 2) ;
	store t$1 (* t$0 4) ;
	store t$2 t$1 ;
	print t$2 ;



In [18]:
run(output_stream)

> 20


In [19]:
program = \
'''
declare double_sum(a,b) 
{
    return (a+b)*2;
}
'''


Step-by-step

In [20]:
state.initialize()
parser.parse(program,lexer=lexer)

In [21]:
dump_AST(state.AST)


(seq 
  |(fundecl double_sum 
  |  |(seq 
  |  |  |(id a) 
  |  |  |(seq 
  |  |  |  |(id b) 
  |  |  |  |(nil))) 
  |  |(block 
  |  |  |(seq 
  |  |  |  |(return 
  |  |  |  |  |(* 
  |  |  |  |  |  |(paren 
  |  |  |  |  |  |  |(+ 
  |  |  |  |  |  |  |  |(id a) 
  |  |  |  |  |  |  |  |(id b))) 
  |  |  |  |  |  |(integer 2))) 
  |  |  |  |(nil)))) 
  |(nil))


In [22]:
state.AST = rewrite(state.AST)
dump_AST(state.AST)


(seq 
  |(fundef double_sum 
  |  |(seq %tsx[0] 
  |  |  |(seq %tsx[-1] 
  |  |  |  |(nil))) 
  |  |(block 
  |  |  |(seq 
  |  |  |  |(return 
  |  |  |  |  |(* %tsx[-3] 
  |  |  |  |  |  |(+ %tsx[-2] 
  |  |  |  |  |  |  |(id %tsx[0]) 
  |  |  |  |  |  |  |(id %tsx[-1])) 
  |  |  |  |  |  |(integer 2))) 
  |  |  |  |(nil))) 4) 
  |(nil))


In [23]:
output_stream = output(codegen(state.AST))
print(output_stream)

	jump L0 ;
#  
# Start of function double_sum
#  
double_sum:
	pushf 4 ;
	store %tsx[0] %tsx[-5] ;
	store %tsx[-1] %tsx[-6] ;
	store %tsx[-2] (+ %tsx[0] %tsx[-1]) ;
	store %tsx[-3] (* %tsx[-2] 2) ;
	store %rvx %tsx[-3] ;
	popf 4 ;
	return ;
	popf 4 ;
	return ;
#  
# End of function double_sum
#  
L0:
	noop ;



In [24]:
program = \
'''
declare double_sum(a,b) 
{
    return (a+b)*2;
}

declare x = double_sum(3,2);
put x;
'''


In [25]:
state.initialize()
parser.parse(program,lexer=lexer)

In [26]:
dump_AST(state.AST)


(seq 
  |(fundecl double_sum 
  |  |(seq 
  |  |  |(id a) 
  |  |  |(seq 
  |  |  |  |(id b) 
  |  |  |  |(nil))) 
  |  |(block 
  |  |  |(seq 
  |  |  |  |(return 
  |  |  |  |  |(* 
  |  |  |  |  |  |(paren 
  |  |  |  |  |  |  |(+ 
  |  |  |  |  |  |  |  |(id a) 
  |  |  |  |  |  |  |  |(id b))) 
  |  |  |  |  |  |(integer 2))) 
  |  |  |  |(nil)))) 
  |(seq 
  |  |(declare x 
  |  |  |(callexp double_sum 
  |  |  |  |(seq 
  |  |  |  |  |(integer 3) 
  |  |  |  |  |(seq 
  |  |  |  |  |  |(integer 2) 
  |  |  |  |  |  |(nil))))) 
  |  |(seq 
  |  |  |(put 
  |  |  |  |(id x)) 
  |  |  |(nil))))


In [27]:
state.AST = rewrite(state.AST)
dump_AST(state.AST)


(seq 
  |(fundef double_sum 
  |  |(seq %tsx[0] 
  |  |  |(seq %tsx[-1] 
  |  |  |  |(nil))) 
  |  |(block 
  |  |  |(seq 
  |  |  |  |(return 
  |  |  |  |  |(* %tsx[-3] 
  |  |  |  |  |  |(+ %tsx[-2] 
  |  |  |  |  |  |  |(id %tsx[0]) 
  |  |  |  |  |  |  |(id %tsx[-1])) 
  |  |  |  |  |  |(integer 2))) 
  |  |  |  |(nil))) 4) 
  |(seq 
  |  |(assign t$1 
  |  |  |(callexp t$0 double_sum 
  |  |  |  |(seq 
  |  |  |  |  |(integer 3) 
  |  |  |  |  |(seq 
  |  |  |  |  |  |(integer 2) 
  |  |  |  |  |  |(nil))))) 
  |  |(seq 
  |  |  |(put 
  |  |  |  |(id t$1)) 
  |  |  |(nil))))


In [28]:
output_stream = output(codegen(state.AST))
print(output_stream)

	jump L1 ;
#  
# Start of function double_sum
#  
double_sum:
	pushf 4 ;
	store %tsx[0] %tsx[-5] ;
	store %tsx[-1] %tsx[-6] ;
	store %tsx[-2] (+ %tsx[0] %tsx[-1]) ;
	store %tsx[-3] (* %tsx[-2] 2) ;
	store %rvx %tsx[-3] ;
	popf 4 ;
	return ;
	popf 4 ;
	return ;
#  
# End of function double_sum
#  
L1:
	noop ;
	pushv 2 ;
	pushv 3 ;
	call double_sum ;
	popv ;
	popv ;
	store t$0 %rvx ;
	store t$1 t$0 ;
	print t$1 ;



In [29]:
run(output_stream)

> 10


# Notes

Perhaps the earliest reference to the notion of a subroutine is by John Mauchly, the co-designer of the [ENIAC](https://en.wikipedia.org/wiki/ENIAC) computer, in his paper *Preparation of Problems for EDVAC-type Machines* which was presented at a symposium in 1947,

Curtiss, J. (1947). [*A Symposium of Large Scale Digital Calculating Machinery*](https://www.jstor.org/stable/2002294?seq=1#page_scan_tab_contents). Mathematical Tables and Other Aids to Computation, 2(18), 229-238. doi:10.2307/2002294

Our discussion of argument passing techniques was based on material from Adam Brook Webber's book,

Webber, A. B. (2010). [*Modern programming languages: A practical introduction*](https://fbeedle.com/our-books/13-modern-programming-languages-a-practical-introduction-2nd-ed-9781590282502.html). Franklin, Beedle & Associates Inc..


# Exercises

1. The Cuppa3 interpreter implements static scoping.  Change the implementation so that it supports
    dynamic scoping.  Show that your implementation works by demonstrating that the modified interpreter
    interprets the following program according to the definition of dynamic scoping:
    ```C
    declare step = 10;
    declare inc(x) {
         return x+step;
    }
    // start a local scopeÉ
    {
         declare step = 2;
         put inc(5);
    }
    ```