# Functions

## Definition

In [57]:
def my_function(arg1, arg2):
    '''Here I can say what my function does.
    I can use several lines
    arg1: It is a good idea to define the arguments as well
    arg2: Another one'''
    
    print(f'I am going to add {arg1} and {arg2} together')
    total = arg1 + arg2
    return total

In [62]:
add=my_function(1,2)
print(add)

I am going to add 1 and 2 together
3


## Type of arguments

You can specify arguments in different ways. You can have required and optional arguments, you can give arguments by position (like above) or by name.

In [64]:
# Give arguments per name (keyword argument)
add=my_function(arg2=1, arg1=2)
print(add)

# This is illegal. Positional arguments need to be first listed.
#add=my_function(arg2=1,2)

I am going to add 2 and 1 together
3


In [75]:
# Optional arguments:
def my_function(arg1, opt=None):
    if opt:
        return(arg1+opt)
    else:
        return(arg1)
    
print(my_function(1))
print(my_function(1,4))

1
5


In [76]:
# Optional arguments can also be used to define default values instead of nothing:
def add_5_or(arg1, opt=5):
    return(arg1+opt)

print(add_5_or(1))
print(add_5_or(1,6))

6
7


In [77]:
# Use lists/tuples or dictionaries to specify arguments
t=(1,4)
print(add_5_or(*t))

dic={'arg1':1, 'opt':4}
print(add_5_or(**dic))

5
5


If you remember, we saw the `**` operator last week with the `str.format()` method. It's exactly the same as here. 

Specifying arguments via a dictionary can be useful for specifying options for plot routines. When tailoring plots, one usually ends up using a lot of options. Defining a dictionary for those can allow for a more readable code and for easier reuse of the options between plots.

## Global vs local scope
Variables defined in function are local, unknown from the outside of the function.

Run the following code. Do you understand what is going on?

<div class="accordion" id="accordion">
  <div class="card">
    <div class="card-header" id="headingOne">
      <h5 class="mb-0">
        <button class="btn btn-link" type="button" data-toggle="collapse" data-target="#expl2" aria-expanded="true" aria-controls="expl2">
          Answer
        </button>
      </h5>
    </div>
  <div>
    <div id="expl2" class="collapse" aria-labelledby="headingOne" data-parent="#accordion">
      <div class="card-body">
          <p>The function defines a local object called total. This object only exists within the function. It is independent from the object define outside the function. `total` defined outside the function is in global scope. This means it can be used throughout the body program including the functions body. </p>
          <p>If you define a local variable with the same name as a global variable in a function, you can not use the global variable in that function.</p>
      </div>
    </div>
  </div>
</div>

In [82]:
total=0
def summing(arg1, arg2):
    total = arg1+arg2
    return total

add = summing(1,2)
print("After the function:", total)

total=2
def sum2(arg1, arg2):
    tt = arg1+arg2+total
    return tt
add = sum2(3,4)
print(add)

After the function: 0
9


## Pass by Reference vs Value
Python passes arguments per reference. That means they still point to the same location in memory outside and within a function.

Do you understand what is going on in the following example?

<div class="accordion" id="accordion">
  <div class="card">
    <div class="card-header" id="headingOne">
      <h5 class="mb-0">
        <button class="btn btn-link" type="button" data-toggle="collapse" data-target="#expl3" aria-expanded="true" aria-controls="expl3">
          Explanation
        </button>
      </h5>
    </div>
  <div>
    <div id="expl3" class="collapse" aria-labelledby="headingOne" data-parent="#accordion">
      <div class="card-body">
          <p>In changeme(), since li is a global object, it's values get modified despite the function not returning anything. </p>
          <p>In changeme2(), li becomes a local variable. The function completely forgets about the argument.</p>
          <p>Note the behaviour would be the same without passing li as an argument since li is a global object.</p>
          <p>What would happen if li was a tuple instead of a list?</p>
      </div>
    </div>
  </div>
</div>

In [88]:
def changeme(li):
    li[0] = 3
    return

def changeme2(li):
    li = [5,4]
    return

li=[1,2]
print(f"List at the start: {li}")

changeme(li)
print(f"List after changeme: {li}")

changeme2(li)
print(f"List after changeme2: {li}")

List at the start: [1, 2]
List after changeme: [3, 2]
List after changeme2: [3, 2]


---

# Read in files, Write to files

### Open and close files

To open a text file, there is the `open()` function. It accepts 2 arguments: name of the file and the opening mode for the file.

| Modes | Meaning |
|:------:|--------|
| r   | read-only mode |
| w   | write-only mode |
| a   | append to existing file |
| r+  | read and write mode |

To read in data, there are 3 methods: `read()`, `readline()`, `readlines()`. The only difference is the amount of data they read from the file. `read()` will only read the given number of charaters (or whole file), `readline()` reads the file line by line, `readlines()` reads in the entire file or a maximum number of bytes/characters.

To close a file, use the `close()` method.

### Read from file

In [9]:
# f.seek(0) allows to rewind the file to the start of the file after each read.
# Check what each output looks like. What is the difference between `f.read()` and `f.readlines()`?
f = open('test.txt','r')
whole_file = f.read()
f.seek(0)
first_line = f.readline()
f.seek(0)
whole2 = f.readlines()
f.close()

### Write to file

Writing to a file is pretty symetrical to reading it in:

In [89]:
f = open('my_file.txt','w')
f.write('Hello!')
lines=['Other line', 'One more']
f.writelines(lines)
f.close()

Hmm, something went wrong. Python needs you to specify those are separate lines by adding a newline symbol: `\n`

In [90]:
f = open('my_file.txt','w')
f.write('Hello!\n')
lines=['Other line\n', 'One more\n']
f.writelines(lines)
f.close()

### With statement

It is also possible to use the `with` statement to work with files. This is commonly used as it provides better error handling and closes the file for you.

In [11]:
with open('test.txt','r') as f:
    first_line = f.readline()

print(first_line)
second_line = f.readline()
print(second_line)

This is an example text file.



ValueError: I/O operation on closed file.

### Exercise
Create a list of the numerical tabular values in test.txt. Make sure the values are of a numeric type (hint: check the Python builtin functions [here](https://docs.python.org/3/library/functions.html#built-in-functions)).

Format the list as you wish:

`[50,30,40,70,20,30]`

`[[50,30,40],[70,20,30]]`

`[[50,70],[30,20],[40,30]]`

<div class="accordion" id="accordion">
  <div class="card">
    <div class="card-header" id="headingOne">
      <h5 class="mb-0">
        <button class="btn btn-link" type="button" data-toggle="collapse" data-target="#collapse1" aria-expanded="true" aria-controls="collapse1">
          Answer
        </button>
      </h5>
    </div>
  <div>
    <div id="collapse1" class="collapse" aria-labelledby="headingOne" data-parent="#accordion">
      <div class="card-body">
<pre><code>
with open('test.txt','r') as f:
    # skip the header
    head_length=2 # number of lines in the reader
    for i in range(head_length):
        f.readline()
    
    # Create a list to store the data
    li = []
    li2 = []
    li3 = []
    # Read each line and parse as needed.
    for line in f.readlines():
        tt = line.split(',')
        tmp = [int(numb) for numb in tt]
        li.extend(tmp)
        li2.append(tmp)
        if li3 == []:
            li3=[[n] for n in tmp]
        else:
            for ind in range(len(tmp)):
                li3[ind].append(tmp[ind])
</code></pre>
      </div>
    </div>
  </div>
</div>


--------
# Additional packages

When you start Python very little gets loaded by default. This is to ensure a quick start of the interpreter and a lower memory usage. Obviously, you will need more than the default.

Additionally, Python is open-sourced and as such lots of additional packages have been contributed over the years. These packages need to be installed before being able to use them.

There are several ways to install packages. A simple one for individuals is Anaconda or Miniconda. That is what you used to prepare for this training (remember the instructions sent before the first training?). One advantage is that it handles dependencies on other packages and non-Python libraries for you. One disadvantage is that not all packages are shared via conda. It also creates a lot of files, which is not good for NCI.

For working at NCI, the CMS maintain several Python environments to avoid duplications. These are quite extensive and we are open to installing more packages (as long as they are compatible with the existing environment). Please try those environments before installing your own. They are publicly opened, so not just for the Centre's folk.

```
module use /g/data/hh5/public/modules
module load conda
```

This will load the stable environment for Python 3, which is most likely the one you want to use. A list of the packages under this environment can be found with: `conda list`

### Load packages for use in your scripts or notebooks
You can load new packages at any point in your script. It's usually done at the top but it doesn't have to.

In [3]:
import numpy    # Most basic form. Imports the whole package
import numpy as np   # Imports the whole package but give an alias to save on typing in your code
from matplotlib import pyplot as plt   # Import just one part of the package.
import matplotlib.pyplot as plt   # Does the same as above.

In [4]:
# To use a package:
a = np.arange(20)
a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

### Some useful packages

From basic Python install:
 - <span style='color:orangered'>os</span>: operating system, e.g. environment variables, working directory, change permissions on files and directories.
 - <span style='color:orangered'>os.path</span>: pathname manipulations, e.g. separate or join basename and file name, check file existence.
 - <span style='color:orangered'>shutil</span>: file operations, e.g. copy, move, delete files
 - <span style='color:orangered'>glob</span>: pathname pattern expansion, e.g. list of files matching: './[0-9].*'
 - <span style='color:orangered'>argparse</span>: parser for command-line options.
 - <span style='color:orangered'>subprocess</span>: to run a separate program.
 
Additional packages:
 - <span style='color:orangered'>numpy</span>: arrays in Python
 - <span style='color:orangered'>scipy</span>: more maths functions (FFT, ODE, linear algebra, interpolation etc.)
 - <span style='color:orangered'>pandas</span>: the ultimate to work with time series
 - <span style='color:orangered'>xarray</span>: better arrays in Python (labelled arrays)
 - <span style='color:orangered'>matplotlib</span>: plotting in Python
 - <span style='color:orangered'>cartopy</span>: map projection and plotting in Python
 - <span style='color:orangered'>dask</span>: parallelisation 

----
# Numpy and arrays

### Create an array
There are plenty of ways to do so depending on what you want. The low-level function is `np.ndarray()` which you probably won't use much. But the [webpage](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html) for this function is interesting as it lists all attributes and methods associated with numpy arrays!

In [11]:
# Low-level 
rr = np.ndarray(shape=(2,4,3),dtype=float)
rr

array([[[-2.00000000e+000,  2.00389627e+000,  7.90505033e-323],
        [ 0.00000000e+000,  7.71515278e+199,  5.02034658e+175],
        [ 4.76786488e-038,  2.77197457e-057,  5.05464895e-038],
        [ 2.34303564e-056,  1.47763641e+248,  1.16096346e-028]],

       [[ 7.69165785e+218,  1.35617292e+248,  1.62602156e+185],
        [ 8.38994136e+165,  7.11405995e-038,  7.12195893e-067],
        [ 4.25850026e-096,  6.32299154e+233,  6.48224638e+170],
        [ 5.22411352e+257,  5.74020278e+180,  8.37174974e-144]]])

In [15]:
# From list or tuple
rr = np.array([[3.,4.],[5.,6.]])
print(rr)
rr1 = np.array([['Claire','Paola'],['Scott','Danny']])  # It doesn't have to be a numerical type  
print(rr1)
rr2 = np.array([['Claire',10], ['Paola', 6]]) # It doesn't have to be only 1 type
print(rr2)

[[3. 4.]
 [5. 6.]]
[['Claire' 'Paola']
 ['Scott' 'Danny']]
[['Claire' '10']
 ['Paola' '6']]


In [17]:
# Initialise to 0 or 1.
rr = np.zeros((2,3),dtype=float)
print(rr)
rr1 = np.ones((2,4))
print(rr1)

[[0. 0. 0.]
 [0. 0. 0.]]
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]]


In [18]:
# Same shape as an existing array
rr2 = np.zeros_like(rr1)
rr2

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [19]:
# Evenly spaced values
rr2= np.arange(5,45,2)
rr2

array([ 5,  7,  9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37,
       39, 41, 43])

In [21]:
# Reshaping an existing array
rr2 = rr2.reshape((5,2,2))
rr2

array([[[ 5,  7],
        [ 9, 11]],

       [[13, 15],
        [17, 19]],

       [[21, 23],
        [25, 27]],

       [[29, 31],
        [33, 35]],

       [[37, 39],
        [41, 43]]])

### Read data from file
Do you remember the csv example above? Here it is with numpy.

In [40]:
li = np.loadtxt('test.txt',delimiter=',',skiprows=2)
print(li)
# For the third format example, simply take the transpose
print(li.T)
# You want the columns in separate arrays?
c1,c2,c3 = np.loadtxt('test.txt', delimiter=',',skiprows=2,unpack=True)
print(c1,c2,c3)

[[50. 30. 40.]
 [70. 20. 30.]]
[[50. 70.]
 [30. 20.]
 [40. 30.]]
[50. 70.] [30. 20.] [40. 30.]


## Indexing
It is the same as for lists etc, except for the multi-dimensional part:

In [28]:
print(f"First element {rr2[0,0]}\n")
print(f"First index of the second dimension \n {rr2[:,0,:]}\n")
print(f"First 2 indexes along the 1st dimension and all other indexes along other dimensions\n {rr2[:2,:,:]}\n")
print(f"Stride {rr2[0:5:2,0,1]}\n")

First element [5 7]

First index of the second dimension 
 [[ 5  7]
 [13 15]
 [21 23]
 [29 31]
 [37 39]]

First 2 indexes along the 1st dimension and all other indexes along other dimensions
 [[[ 5  7]
  [ 9 11]]

 [[13 15]
  [17 19]]]

Generic form to say 'all other indexes along all other dimensions'
 [[[ 5  7]
  [ 9 11]]

 [[13 15]
  [17 19]]]

Stride [ 7 23 39]



There is a generic form to say "all other indexes along all other dimensions" without specifying the number of dimensions in your array. It can be used to indicate all dimensions before or after the specified slice:

In [30]:
print(f"Specify slices in all dimensions: \n{rr2[:2,:,:]}\n")
print(f"Generic form:\n{rr2[:2,...]}\n")
print(f"Any number of dimensions specified before:\n{rr2[:2,0,...]}\n")
print(f"Works for the start of the array as well:\n{rr2[...,0]}\n")

Specify slices in all dimensions: 
[[[ 5  7]
  [ 9 11]]

 [[13 15]
  [17 19]]]

Generic form:
[[[ 5  7]
  [ 9 11]]

 [[13 15]
  [17 19]]]

Any number of dimensions specified before:
[[ 5  7]
 [13 15]]

Works for the start of the array as well:
[[ 5  9]
 [13 17]
 [21 25]
 [29 33]
 [37 41]]



### Matlab users
In Matlab, arrays are matrices. That is not true in Python. This means in Matlab, the multiplication is the matrice multiplication, in Python that's multiplication element by element.
This [page](https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html#numpy-for-matlab-users-notes) provides a long table of equivalents between Matlab and Python.

## Operations with arrays along some given axis
Obviously, `numpy` has a lot of handy functions for common operations. For example if you want the mean of an array:

In [41]:
rr2.mean()

24.0

That's handy, what is even more handy is the possibility to calculate the mean over a given dimension only. For example, rr2 is 3D. Let's say the dimensions are time, latitude and longitude respectively and you want to calculate the time average at each spatial point:

In [42]:
rr2.mean(axis=0)  # Remember indexes start at 0

array([[21., 23.],
       [25., 27.]])