# Being more advanced with `numpy` 

This Notebook gathers material that is useful when working with numpy, but not necessary for a first contact with this module. 

This modules expands the some sections covered in [Modules_in__python_numpy.ipynb](Modules_in__python_numpy.ipynb). 

## Table of Content


* [II.2 Array copies and views](#II.2a)
* [II.3 Shape manipulation](#II.3a)
* [II.5 Reading arrays from a file and string formatting](#II.5a)
* [II.6 Useful Numpy functions](#II.6a)

In [None]:
import numpy as np

## II.2 `array` copies and views:   <a class="anchor" id="II.2a"></a>

A slicing operation creates a **view** on the original array, which is just a way of accessing array data. Thus the original array is not copied in memory. You can use `np.may_share_memory()` to check if two arrays share the same memory block.
To provide this behaviour, and create a brand new array from the slice of the original one *without modifying the latter*, you may use the method `copy()`: `c = a[0:2].copy()` will create a **new array** that is a **copy** of the first two elements of a. 

**When modifying the view, the original array is modified as well**. Try the cells below to understand how memory allocation work. 

In [None]:
#import numpy as np
a = np.arange(10)
a

In [None]:
b = a[::2]
b

In [None]:
np.may_share_memory(a, b)

In [None]:
b[0] = 12
b

In [None]:
a   # (!)

In [None]:
a = np.arange(10)
print(a) 
c = a[::2].copy()  # force a copy
c[0] = 12
a

In [None]:
np.may_share_memory(a, c)

Note that there is now another numpy method called `np.shares_memory()` that performs a more robust test of memory sharing (it checks wether there is a memory overlap). This function is more computationally intensive than the `np.may_share_memory()` which can yield some False positives.  

See this [link](https://stackoverflow.com/questions/44865261/what-is-the-difference-between-numpy-shares-memory-and-numpy-may-share-memory) for more details about this subtle difference. 

### II.3 Array shape manipulation <a class="anchor" id="II.3a"></a>


- **II.3.1 Reshaping**:   
The method `reshape(newshape)` allows one to reorganise the elements of an array, to create a "new" array (see below) that has a different shape. The total number of items of the array has to be the same ! 

In [None]:
print(a) 
a.shape

In [None]:
b = a.ravel()
b = b.reshape((2, 3))
b

In [None]:
# Alternatively 
a.reshape((2, -1))    # unspecified (-1) value is inferred

**WARNING:** Reshaping may return a **view** or a **copy** !

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6]])
b=a.ravel()
b=b.reshape((2,3))
# Let's modify b and show a to see if we have a view or a copy ... 
b[0, 0] = 99
a


In [None]:
# let's now create an array with np.zeros and reshape it 
a = np.zeros((3, 2))
b = a.T.reshape(3*2)
b[0] = 9
a


To understand this you need to learn more about the memory layout of a numpy array. This is beyond the scope of this lecture. 

- **II.3.2 Flattening**:    
The method `ravel()` flattens the array into a single-row array (each row of the array is merged with the previous one). Reshape may allow you to do this operation as well but modifying directly the array. 

In [None]:
#import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6]])
print(a) 
print(a.ravel())
a

In [None]:
a.T   # Transpose the array

In [None]:
a.T.ravel()

**Note**: `a.T` is a property of array `a` that returns the array transposed, while np.transpose(a) is a function that returns a view of the array(a) transposed. As a.T is a property of the object a, it is relatively quicker than the call of a function as you can test using the `%timeit` magic command. For N dim arrays, transpose() allows a bit more than just transposing (see below II.3.4.)

In [None]:
%timeit(a.transpose())
%timeit(a.T)

- **II.3.3 Adding a dimension**:

Indexing with the `np.newaxis` object allows us to add an axis to an array. You can also use the `reshape` method.  

In [None]:
z = np.array([1, 2, 3])
print(z.shape)
z

In [None]:
print(z[:, np.newaxis])
z[:, np.newaxis].shape

In [None]:
z[np.newaxis, :]

In [None]:
z[np.newaxis, :].shape

In [None]:
# An alternative is to reshape your array
y = np.array([1, 2, 3])

# When one shape dimension is -1, the value is inferred from the length of the array and remaining dimensions.
y = y.reshape((-1,1))   
y.shape

In [None]:
y = np.array([1, 2, 3])
y = y.reshape((1,-1))
y.shape

- **II.3.4. Dimension shuffling**:

In [None]:
a = np.arange(4*3*2).reshape(4, 3, 2)
a.shape

In [None]:
a

In [None]:
a[1, 2, 0]

In [None]:
b = a.transpose(1, 2, 0)
b.shape

In [None]:
b

In [None]:
b[2, 1, 0]

In [None]:
# Check that shuffling dimensions creates a view of the array

- **II.3.5. Resizing**: 

Size of an array can be changed with `ndarray.resize`:

In [None]:
a = np.arange(4)
print(a)
a.resize((8,))   # you give as argument the new shape of the array
a


However, it must not be referred to somewhere else:

In [None]:
b = a
a.resize((4,))   

**Exercises:**

- Use flatten as an alternative to ravel. What is the difference? (Hint: check which one returns a view and which a copy)
- Experiment with transpose for dimension shuffling.


- **II.3.6. Meshgrid**: 

A very useful method that returns coordinate matrices from coordinate vectors. This is extremely useful when you want to evaluate a function on a grid (i.e. $z = f(x, y)$) ... which is something very common in observational astronomy ! This is also useful when you want to do contour plots (to e.g. interpolate over a regular grid), evaluate a function at specific pixel positions of an image, ... 

The way to proceed is to define your `x` and `y` vectors (corresponding to the (x,y) coordinates on a grid is the following:
``` python
x_vec, y_vec = np.linspace(0, 5, 6), np.linspace(0, 5, 3)
X, Y = meshgrid(x_vec,y_vec)

# Now you can evaluate the function z = (x**2 + y**2)
Z = X**2 + Y**2
```

`X` and `Y` created  with meshgrid() are now arrays of shape (3, 6) (3 rows and 6 columns) containing respectively coordinate x (for X) and y (for Y) of each grid element. This can be generalised to larger dimensions !

So, the array `Z` of shape (3,6) corresponds to points with the following coordinates:

['(0.0,0.0)', '(1.0,0.0)', '(2.0,0.0)', '(3.0,0.0)', '(4.0,0.0)', '(5.0,0.0)']   
['(0.0,2.5)', '(1.0,2.5)', '(2.0,2.5)', '(3.0,2.5)', '(4.0,2.5)', '(5.0,2.5)']   
['(0.0,5.0)', '(1.0,5.0)', '(2.0,5.0)', '(3.0,5.0)', '(4.0,5.0)', '(5.0,5.0)']   

**Note:**    
This function supports both indexing conventions through the indexing keyword argument.  Giving the string 'ij' returns a meshgrid with matrix indexing, while 'xy' returns a meshgrid with Cartesian indexing. In the 2-D case with inputs of length M and N, the outputs are of shape (N, M) for 'xy' indexing and (M, N) for 'ij' indexing.  In the 3-D case with inputs of length M, N and P, outputs are of shape (N, M, P) for 'xy' indexing and (M, N, P) for 'ij' indexing. In other words, indexing 'ij' yields a transposed version of the array obtained with indices i,j. See `help(meshgrid)` for more details. 

In [None]:
#import numpy as np 
x_vec, y_vec = np.linspace(0, 5, 6), np.linspace(0, 5, 3)
print(x_vec, y_vec)
X, Y = np.meshgrid(x_vec,y_vec)
print(Y)

In [None]:
# Experiment with "meshgrid()" following the code above. 
#import numpy as np

# Try to write a command that prints at the screen the coordinates of the grid elements (as above) (TIP: you do not need meshgrid)


**Exercise**: 

We will use meshgrid [later](Modules_in__python_matplotlib.ipynb#meshgrid), after we have learned how to visualise results with `python`. 

### II.5 Reading arrays from a file and string formatting:    <a class="anchor" id="II.5a"></a>

Reading tables saved in a formated text file can be done with `numpy.loadtxt('myfile.txt')`, while saving your array is done with `numpy.savetxt('myfile.txt')`.     

Clever loading of text/csv files: `numpy.genfromtxt()`/`numpy.recfromcsv()`. Those commands can fill missing values in a table, read column names, exclude some columns, and guess data-type using `dtype = None`.   

Fast and efficient, but numpy-specific, binary format: `numpy.save()`/`numpy.load()`.

There is another flexible way to read/write in file, which is through the use of the `file()` object. For this, three operations are generally needed: 

``` python
with open('myfile.txt', 'r') as f: # 'r' for read mode, 'w' for write mode, 'a' for append mode)
    read_data = f.read() # this would read the whole file as a single string ; other methods allow one more flexible read

# One can also do the following (see below) but there is the risk to get the file not being properly closed. 

f = open('myfile.txt', 'r')  
f.read() 
f.close() 
```

If you do `f.read()` twice, you will see an empty string ... as the object instance then "points" to the end of the file, and there is nothing left to read. Somehow, the methods that access the file object go sequentially through the "string content" of that object. With `read()` you take the string as a whole (which could be a problem memory-wise if the file is large !). 

Note that effectively, f.read() reads the file character by character, as you would realise by typing the following command: 
``` python
f = open('myfile.txt', 'r')
for char in f.read():
    print(repr(char))
```

If you use a `for` loop on `f`, then the file is read line by line:
``` python
f = open('myfile.txt', 'r')
for line in f:
    print(repr(line))
```

In [None]:
with open('data.txt', 'r') as f: 
    read_data = f.read() 

In [None]:
f = open('data.txt', 'r')
for line in f:
    print(repr(line))   # repr(object) return the canonical string representation of the object

In [None]:
f.close()

In [None]:
f = open('data.txt', 'r')
a = f.readlines()
a

In [None]:
a[10].replace('.', ',')

In [None]:
f.close()

In [None]:
a

Each line is being returned as a string. Notice the `\n` at the end of each line - this is a line return character, which indicates the end of a line.

Alternatively, you could also do:
``` python
f = open('myfile.txt', 'r')
for line in f.readlines():
    print(repr(line))
```
BUT `f.readlines()` actually reads in the whole file and splits it into a **list** of lines (while `for line in f` reads one line at a time), so for large files this can be memory intensive. The above option is therefore prefered.     

In [None]:
f = open('data.txt', 'r')
for line in f.readlines():
    print(repr(line))

In [None]:
print(repr(line))
a = line.strip().split()  
float(a[2])

 Once a line is read, it is possible to apply string methods, as on normal string:    
- Remove `\n`: `line.strip()`
- Split the string into list of strings: `line.split()`
- Replace a specific character by another: `line.replace(',', '.')`  replaces each comma by a dot.
- Access a specific element of a splitted list and convert it to float: `float(line.split()[2])`

To write a file, you basically follow the same procedure: 
``` python
f = open('myfile.txt', 'w')
f.writelines(mylist_of_lines)   # mylist_of_lines contains the lines you want to write. Ensure that they end with `\n`

# you can also use:
f.write(mylist_of_lines[0]+mylist_of_lines[1]+ ... + mylist_of_lines_[n])  # you can use list comprenhesion as argument
f.close()
```

**Exercise:**

Read the file `data.txt` and display the some columns you care about for that file using:
- the file object
- Try to do the same using `numpy.loadtxt()` 
- Try to do the same using using `numpy.genfromtxt()`.     

Bonus:      
- Try to build a numpy array with the data in data.txt as read using f = open('data.txt'). 
- Modify 1 column of the file (replace it with 0) and write the results in `data_new.txt`

**Note:**

Those methods/functions for reading ascii files are not always optimal to read tables containing both strings and floats. Other packages, such as `pandas` and `astropy`, offer more flexible functions to read large variety and formats of tables.    

#### Formatting Strings

It often happens that you do not need to save all the decimals of a number, or would like to see it in scientific notation. There are [multiple ways to do it](https://docs.python.org/3/tutorial/inputoutput.html). One could spend (boring) hours describing all possible ways to format strings. The main 2 options are described below. You may look at https://pyformat.info/ to skim through various examples of formatting. The options described below explains you the basics and points yoi to relevant documentation. 

- **Option 1**: `printf-style` (simple (old style) but not universal) 

You can use the `%` operator to specify the formatting of the variable you want to show at the screen or save in a file. The variable does not appear explicitly in the string but after it in a tuple, preceded by the `%`. Within the string, the `%` operator will be followed by a format string such as `%f` for a float or `%e` for scientific notation. The sequence `'%.2f'%variable` basically tells that the `%` operator converts the `variable` into a float with 2 digits after the dot. This is generalized to a sequence of variable, by defining the tuple object that contains all the variables to be formatted (but you need to specify the format you want for those, the association between the format and the variable being done easily as you have put your variable into a tuple-object). 

Example:
``` python
print('%i is the square of %i' %(4.000, 2))
    Out: 4 is the square of 2
```
Here are some commonly used formatting characters:
- `%s`: String (or any object with a string representation, like numbers)
- `%d` or `%i`: Integers
- `%.<number_of_digits>f`: Floating point numbers with fixed number of digits to the right of the dot. 
- `%.<number_of_digits>e`: scientific notation with fixed number of digits to the right of the dot.   
You may find more about string formatting in [python 3.8 documentation](https://docs.python.org/3.8/library/stdtypes.html#str).  


In [None]:
# Experiment with the above examples 

- **Option 2**: `str.format()` method

This is a much more flexible and general method described in details at https://docs.python.org/3/library/string.html#formatstrings. Format strings contain `replacement fields` surrounded by curly braces `{}`. Anything that is not contained in braces is considered literal text, which is *copied unchanged to the output*. If you need to include a brace character in the literal text, it can be escaped by doubling: {{ and }}. The `replacement field` can start with a `field_name` that specifies the object whose value is to be formatted and inserted into the output instead of the replacement field. The field_name is optionally followed by a `conversion field`, which is preceded by an exclamation point '!', and a `format_spec`, which is preceded by a colon `:`. These specify a non-default format for the replacement value. The `conversion field` causes a type coercion before formatting. You can in general ignore it). The `format_spec` is more advanced than in the printf style, allowing for alignement, signing, filling empty spaces, .... See [here](https://docs.python.org/3/library/string.html#format-specification-mini-language) and [here](https://pyformat.info/) for more details and EXAMPLES. 

Example:
``` python
print('{0:.0f} is the square of {1:n}'.format(4.000, 2))
    Out: 4 is the square of 2
```
If you wich a float representation with 2 decimals: `{0:.2f}`. 
You can also use the positional argument to revert the output:
``` python
print('{1:.0f} is the square of {0:n}'.format(2, 4.000))
    Out: 4 is the square of 2
```

**Note**: 
- About `conversion field`: There are 3 possible conversions flags: `!s` which calls [str()](https://docs.python.org/3/library/stdtypes.html#str) on the value, `!r` which calls [repr()](https://docs.python.org/3/library/functions.html#repr) and `!a` which calls [ascii()](https://docs.python.org/3/library/functions.html#ascii).
- About `format_spec`: The general form of the formatter is
``` python
format_spec     ::=  [[fill]align][sign][#][0][width][grouping_option][.precision][type]
fill            ::=  <any character>
align           ::=  "<" | ">" | "=" | "^"
sign            ::=  "+" | "-" | " "
width           ::=  digit+
grouping_option ::=  "_" | ","
precision       ::=  digit+
type            ::=  "b" | "c" | "d" | "e" | "E" | "f" | "F" | "g" | "G" | "n" | "o" | "s" | "x" | "X" | "%"
```
See https://docs.python.org/3/library/string.html#format-specification-mini-language

In [None]:
# Experiment with the above examples 

In [None]:
# Create three float variables a, b, c and give them some value (e.g. a=2.3, b=3, c=-5). 
# Print the sentence: `a=2.00, b=3 and c=-5.00e+00` using the formating format described above.

In [None]:
# Create a 1-D array of 5 floats and print their value with 2 digits floats. TIP: use list comprehension

**Note**: There is another very useful way in python to save "full objects" and access and use them later using all their characteristics. This can be done by importing the `pickle` [module](https://docs.python.org/2/library/pickle.html), or even better (faster) [cPickle]( http://docs.python.org/library/pickle.html#module-cPickle). When you want to write a pickle into a file, simply open your file (`pkl_file = open()`), use `pickle.dump(obj, pkl_file, protocol=-1)`, and close your file (`pkl_file.close()`). To read an object saved in a pickle file, you can follow the same procedure but use `	obj = pickle.load(pkl_file)` instead of `pickle.dump()`. The `pandas` module also allows you to read/write pickle objects: see `pandas.read_pickle()` and `pandas.to_pickle()`

### II.6 Other useful numpy function:  <a class="anchor" id="II.6a"></a>

You'll find below a non exhaustive list of numpy functions that you may find useful when manipulating arrays. 

#### Sort, transform
- `np.sort(a)`: Returns sorted copy of an array along a specific axis (default = last axis)
- `np.absolute(a)`: Calculates absolute value element-wise 
- `np.concatenate(a1, a2, ...)`: Join a sequence of arrays along an existing axis.
- `np.hstack(tup)`: Stack arrays in sequence horizontally (column wise).
- `np.vstack(tup)`: Stack arrays in sequence vertically (row wise).

#### Search

- `np.nonzero()`: Return the indices of the elements that are non-zero. Returns indices of "True" if array of booleans. 
- `np.where(condition, [x, y])`: Return elements chosen from `x` or `y` depending on `condition`. If only condition is provided, it returns e.g. the indices for which the condition is verified. The function is then a shorthand for `np.asarray(condition).nonzero()`. Using `nonzero` directly should be preferred, as it behaves correctly for subclasses. 
- `np.searchsorted(a, v)`: Find indices where elements should be inserted to maintain order.
- `np.intersect1d(ar1, ar2)`: Intersection of 2 arrays. 
- `np.isnan()`: Test element-wise for NaN and return result as a boolean array.
- `np.setdiff1d(ar1, ar2)`: Return the unique values in `ar1` that are not in `ar2`.

#### Operations

- `np.convolve(a, v, mode='full')`: Returns the discrete, linear convolution of two one-dimensional sequences. 
- `np.nanmin(), np.nanmax(), np.nanmean(), np.nanmedian()`: Calculates min, max, mean, meadian of an array but ignoring NaNs. 
- `np.deg2rad()`, `np.rad2deg()`: converts degrees to radians and vice versa 
- `np.trapz(y)`: Integrate along the given axis using the composite trapezoidal rule.