# Functions

This lesson covers:

* Import modules
* Calling functions with more than one input and output 
* Calling functions when some inputs are not used 
* Writing a custom function 

## Problem: Importing Modules

Python is a general-purpose programming language and is not specialized for
numerical or statistical computation. The core modules that enable Python to store
and access data efficiently and that provide statistical algorithms are
located in modules.  The most important are:

* NumPy (`numpy`) - provide the basic array block used throughout numerical
  Python
* pandas (`pandas`) - provides DataFrames which are used to store 
  data in an easy-to-use format
* SciPy (`scipy`) - Basic statistics and random number generators. The most
  important submodule is `scipy.stats`
* matplotlib (`matplotlib`) - graphics. The most important submodule is
  `matplotlib.pyplot`.
* statsmodels (`statsmodels`) - statistical models such as OLS. The most
  important submodules are `statsmodels.api` and `statsmodels.tsa.api`.

Begin by importing the important modules.

In [1]:
import numpy
import pandas
import scipy.stats
import matplotlib.pyplot
import statsmodels.api
import statsmodels.tsa.api

## Problem: Canonical Names

Use the `as` keyword to import the modules using their canonical names:

| Module              | Canonical Name |
| :------------------ | :------------- |
| numpy               | np             |
| pandas              | pd             |
| scipy               | sp             |
| scipy.stats         | stats          |
| matplotlib.pyplot   | plt            |
| statsmodels.api     | sm             |
| statsmodels.tsa.api | tsa            |

Import the core modules using `import` _module_ `as` _canonical_.

In [2]:
import numpy as np
import pandas as pd
import scipy as sp
import scipy.stats as stats
import matplotlib.pyplot as plt
import statsmodels.api as sm
import statsmodels.tsa.api as tsa

## Problem: Importing individual functions

1. Import `array`, `sqrt`, `log` and `exp` from NumPy.
2. Import `OLS` from `statsmodels.regression.linear_model`
3. Import the `stats` module from `scipy`

In [3]:
from numpy import array, sqrt, log, exp
from statsmodels.regression.linear_model import OLS
# Same effect as above
from scipy import stats

Read the data in momentum.csv and creating some variable. This cell uses some magic to automate 
repeated typing.


In [4]:
# Setup: Load the momentum data
import pandas as pd

momentum = pd.read_csv('data/momentum.csv')

print(momentum.head())

mom_01 = momentum['mom_01']
mom_10 = momentum['mom_10']

         date  mom_01  mom_02  mom_03  mom_04  mom_05  mom_06  mom_07  mom_08  \
0  2016-01-04    0.67   -0.03   -0.93   -1.11   -1.47   -1.66   -1.40   -2.08   
1  2016-01-05   -0.36    0.20   -0.37    0.28    0.16    0.18   -0.22    0.25   
2  2016-01-06   -4.97   -2.33   -2.60   -1.16   -1.70   -1.45   -1.15   -1.46   
3  2016-01-07   -4.91   -1.91   -3.03   -1.87   -2.31   -2.30   -2.70   -2.31   
4  2016-01-08   -0.40   -1.26   -0.98   -1.26   -1.13   -1.02   -0.96   -1.42   

   mom_09  mom_10  
0   -1.71   -2.67  
1    0.29    0.13  
2   -1.14   -0.45  
3   -2.36   -2.66  
4   -0.94   -1.32  


This data set contains 2 years of data on the 10 momentum portfolios from 2016–2018. The variables
are named mom_XX where XX ranges from 01 (work return over the past 12 months) to 10 (best return 
over the past 12 months). 

## Problem: Calling Functions
Functions were used in the previous lesson. Get used to calling functions by computing the mean,
std, kurtosis, max, and min of the 10 momentum portfolios. Also, explore the help 
available for calling functions `?` operator. For example,

```python
momentum.std?
```  

opens a help window that shows the inputs and output, while

```python
help(momentum.std)
```

shows the help.

In [5]:
# Use the functions attached to the Series
print(mom_01.mean(), mom_01.std(), mom_01.skew(), mom_01.kurt())
print(mom_10.mean(), mom_10.std(), mom_10.skew(), mom_10.kurt())

# Use the NumPy functions and the statistics function in SciPY
# These are the same up to some bias-adjustment constants that depend only on sample size
import numpy as np
import scipy.stats as stats
print(np.mean(mom_01), np.std(mom_01), stats.skew(mom_01), stats.kurtosis(mom_01))

0.10190854870775348 1.7201674428556768 -0.10718993942161407 3.6858942336434177
0.06095427435387675 0.9514153243557435 -0.7699794641799718 2.6473273511803805
0.10190854870775348 1.718456684160085 -0.10687002235784658 3.6374521972731158


## Problem: Calling Functions with 2 Outputs

Some useful functions return 2 or more outputs. One example is ``stats.ttest_ind`` 
performs a t-test that the mean of two independent samples is equal. It returns the
test statistic as the first return and the p-value as the second.

Use this function to test whether the means of `mom_01` and `mom_10` are different.  

In [6]:
# The full set of outputs is returned as a tuple
output = stats.ttest_ind(mom_01, mom_10)
print(output)

# You can also supply as many output as required to directly assign each component
test_stat, pvalue = stats.ttest_ind(mom_01, mom_10)
print(test_stat)
print(pvalue)


Ttest_indResult(statistic=0.46725641839109444, pvalue=0.6404178222702447)
0.46725641839109444
0.6404178222702447


## Problem: Calling Functions with 2 Inputs

Many functions take two or more inputs. Like outputs, the inputs are simply listed in order
separated by commas. Use `np.linsapce` to produce a series of 11 points evenly spaced between 0 
and 1. The help for `np.linspace` is listed below (`linspace?`). 

In [7]:
np.linspace(0, 1, 11)

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])

## Problem: Calling Functions using Keyword Arguments

Many functions have optional arguments. You can see these in a docstring since
optional arguments take the form `variable=default`. For example, see
the help for `np.mean`

In [8]:
np.mean?

which is 

```python
np.mean(a, axis=None, dtype=None, out=None, keepdims=<no value>)
```

This tells us that only `a` is required and that the other 4 inputs can
be omitted if you are happy with the defaults.  However, if we want to change some of the optional inputs, then we can directly use the inputs name in the function call.

For example, a pandas DataFrame has a function `std` that computes the standard
deviation. By default, it divides by `n-1`.  The `1` can be set using `ddof`.

Compute `std` using `ddof=0` on the momentum data.

In [9]:
momentum.std(ddof=0)

mom_01    1.718457
mom_02    1.136706
mom_03    0.920436
mom_04    0.816846
mom_05    0.755038
mom_06    0.738145
mom_07    0.708202
mom_08    0.719665
mom_09    0.772839
mom_10    0.950469
dtype: float64

In [10]:
momentum.std()  # Default is 1, so these value are larger

mom_01    1.720167
mom_02    1.137837
mom_03    0.921353
mom_04    0.817660
mom_05    0.755789
mom_06    0.738880
mom_07    0.708907
mom_08    0.720382
mom_09    0.773608
mom_10    0.951415
dtype: float64

## Problem: Writing a Custom Function
Custom functions will play an important role later in the course when estimating parameters.
Construct a custom function that takes two arguments, mu and sigma2 and computes the
likelihood function of a normal random variable 

$$f(x;\mu,\sigma^{2})=\frac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left(-\frac{(x-\mu)^{2}}{2\sigma^{2}}\right)$$

Use `def` to start the function and compute the likelihood of: $$x=0,\mu=0,\sigma^{2}=1.$$

The text in the triple quotes is the docstring which is optional.

In [11]:
def normal_likelihood(x, mu, sigma2):
    """
    Compute the normal likelihood for a scalar value
    
    Parameters
    ----------
    x : float
       The point ot evaluate
    mu : float
       The mean
    sigma2 : float
        The variance
    
    Returns
    -------
    float
        The likelihood value.
    """
    a = 1 / np.sqrt((2 * np.pi * sigma2))
    b = (x-mu) ** 2
    c = 2 * sigma2
    ll = a * np.exp(-b / c)
    # Must call return to send a value back
    return ll

print(normal_likelihood(0, 0, 1))

# Built into SciPy stats, should match
print(stats.norm.pdf(0, 0, 1))

0.3989422804014327
0.3989422804014327


## Exercises

### Exercise: Custom Function

Write a function named summary_stats that will take a single input, x, a DataFrame and return a 
DataFrame with 4 columns and as many rows as there were columns in the original data where the
columns contain the mean, standard deviation, skewness and kurtosis of x. 


### Exercise: Custom Function

Change your previous function to return 4 outputs, each a pandas Series for the mean,
standard deviation, skewness, and the kurtosis.

Returning multiple outputs uses the syntax
```python
return w, x, y, z
```