# 9. Numpy

Numpy is the package that makes scientific computing in Python possible to begin with. You will probably import in into every script and module you write and if you don't then you're importing a package that depends on `numpy`. It provides important data types like `numpy.arrays`, that are similar to MATLAB arrays. 

If you prefer R-style DataFrames, you can go with `pandas`. It's very widely used in Data Science. In the background it's a very convenient wrapper around numpy. 
As stated in the beginning, we won't cover `pandas` here, but you can find a 10 minute intro to pandas [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html)

## 9.1 import convention
The convention to import numpy is this:

```Python
import numpy as np
```
You also find
```Python
from numpy import *
```
but I advise against using this approach. Namespaces are a good thing.

In [2]:
import numpy as np

## 9.2 Numpy arrays

The most important aspect of numpy is to provide us with a data type that is able to hold numeric data in an efficient manner. As usual, you can use the class `np.array` as a constructor to transform other data types into numpy arrays.

**Exercise**

Transform the following list of lists into a np.array

TODO: Concatenation, arithmetic, matrix algebra math functions, some methods, 

In [3]:
data = [[1,0,0],[0,1,0],[0,0,1], [1, 1, 1]];
#your code here


`np.arrays` provide a lot of methods. You can 

#### 8.2.1.1 Matrix algebra

The `*`-operator uses matrix multiplication by default. The same operator in Python does not. Python 3.5 introduced the `@`-operator for this purpose. This also shows the meaning of numpy for the Python universe.

The following are nonsense examples.

Assuming the relation y = **X**betas + e, compute y-hat from the following matrix X (data) and vector betas (regression weights) using matrix multiplication.

In [4]:
betas = np.array([[1.4], [0.4], [0.98]]) # this is needed to create to create a column vector
X = np.random.randint(1, 100, (100,3))

In [5]:
#your code here


Subtraction and addition work the same as in MATLAB. As long as the shapes can be broadcasted together.

**Exercise**

Compute the error between your computed y_hat and the following "empirical" y:

In [6]:
y = np.random.randn(100,1) * 50 + 145;

In [7]:
#your code here


For basic operations on the data, np.arrays provide methods, e.g. `array.sum()`.

**Exercise**

Compute the sum of squared residuals. *Hint*: Yyou can do it in one line.

In [8]:
#your code here


Theres's a ton of more functions that arrays provide you with. If you use tab completion to get an idea about the amount, you'll understand that we can't cover all. However, we can cover one pitfall:

<br/>

#### 8.2.1.2 Fortran order vs. C order

Fortran and C define different standards about how arrays are stored in memory: [Row-major and column-major order](https://en.wikipedia.org/wiki/Row-_and_column-major_order) respectively. That's basically, when you use `matrix(:)` in MATLAB, do you glue rows together or columns? Fortran and MATLAB are column-dominant. C and numpy are row-dominant.

Have a look:

**Exercise**

   1. Use `np.arange` to create a vector of values from 1 to 25. `np.arange` works like `range()`, except it returns a np.array.
   2. Use the array method `array.reshape()` to reshape it into a 5,5-matrix. 
   3. What would you expect? Print the array.


In [None]:
#your code here


The same problem arises for `array.ravel()` and `array.flatten()`, that act similar ([but not identical](https://stackoverflow.com/questions/28930465/what-is-the-difference-between-flatten-and-ravel-functions-in-numpy)). Fortunately, there is a way around this. All of these functions take an optional parameter `order`. You can pass either `'C'`, for row-major order of `'F'` for column-major, Fotran/MATLAB-like order.

**Exercises**

Use `ravel` or `flatten` to reshape the matrix into a vector, using column-major order.

In [35]:
#your code here


Numpy is obviously inspired by MATLAB. Some functions are the same. Among them e.g. `linspace`. Others have close analogues. You can find a side-by-side view [here](https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html). Apart from that, you just have to use it to get used to it.

<br/>

# Conclusion

This was barely even scratching the surface. There is **a lot** to be learned about these. But hopefully now you have a rough idea about what you can expect from these packages.
This was the last bit of spoonfeeding. I think you're ready to be pushed into the cold water and learn how to swim. See you in the last notebook.