# NumPy – Numerical Python Library
NumPy is such a crucial library in Python that often NumPy isn't even regarded as a 3rd party library and has become pretty much a core component – if not of the Python itself, then surely of an engineer (especially data engineer's) arsenal. I won't waste any time in prologues and directly jump to the business. We can import NumPy as:

In [1]:
import numpy

We will divide this tutorial in two parts:

- Fundamentals – covered here
- Advanced – in next notebook

## Outline

This notebook is highly influenced from the **[NumPy's own documentation](https://numpy.org/doc/stable/user/index.html)**.

- NumPy Arrays
- Linear Algebra



## Arrays

NumPy arrays are better than Python lists for a number of reasons, including efficiency and some other features we will cover in this notebook.
> **Note:** If you are interested, please feel free to look up the differences between the NumPy vs Python arrays in detail over the internet. I would like to mention them one by one and steadily, hence not mentioning them all in a go.

There can be a number of ways of making a NumPy array, like:

### From Python Collections

Python collections (lists, tuples or sets) can be simply converted to a NumPy array as:

**`<np array> = NumPy.array(<Python collection>)`**

For example:

In [2]:
listA = [34, 21, 45]
setB = {2, 4, 6}
tupleC = (2, 4, 8, 16, 24)

Now converting them into NumPy arrays respectively:

In [3]:
npA = numpy.array(listA)
print(npA)

npB = numpy.array(setB)
print(npB)

npC = numpy.array(tupleC)
print(npC)

[34 21 45]
{2, 4, 6}
[ 2  4  8 16 24]


Here we need to pause a little bit to reflect.

- `npA` and `npC` have 3 and 5 members respectively, but `npB` has only 1.
- It may sound like `2`, `4` and `6` are separate members of the NumPy array (`npB`), but they are actually part of a single set.
- Which also means that NumPy arrays can have Python collections as its members as well.
- Tuples are immutable and we haven't touched their immutability one bit here. `tupleC` is still intact and `npC` is a separate array having tuple's members copied into it.

By the way, before we proceed. It sounds too much to specify `numpy` before every function call and can be abbreviated/aliased. Usually we follow the `np` alias.

---

Let's confirm the above points by using the NumPy array's **`size()`** method. Its one of a number of methods for the arrays (we will refer to NumPy arrays as arrays from now on).

In [4]:
import numpy as np

print(npA.size)
print(npB.size)
print(npC.size)

3
1
5


### Intrinsic Creation

We can also make an array from scratch by using the built-in intrinsic methods, like:

- `arange()`
- `linspace()`
- `eye()`
- `diag()`

We will see the first two right now and matrices one soon in the respective section.

In [5]:
x = range(10)
x

range(0, 10)

#### **`arange()`**

Whenever we need an array with an arithmetic sequence, `arange()` is the function we are looking for.

Its syntax is:

`<arr> = np.arange(a,b)`

Where $[a,b)$ is the range of the values in the given array.

In [6]:
first10 = np.arange(1,11)
first10

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

We can also use it for sequences other than the default (`1`) step as:

`<arr> = np.arange(a,b,d)`

**Note:** We can check the nth element as:

$$a_n = a+(n-1)d$$



In [7]:
odd20 = np.arange(1,40,2)
odd20

array([ 1,  3,  5,  7,  9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33,
       35, 37, 39])

#### **`linspace()`**

Often while plotting data in 2D, we need to divide the domain and/or range evenly. For example, we have the range of $[1,10]$ and want it to divide in 50 equidisant points.

For such a cases, `linspace()` comes to the rescue. Its format is:

`<arr> = np.linspace(a,b,n)`

where `n` is the number of total values we need (like `50` in the above example).

**Important:** Note the closing square braces for the range above. `linspace()` does **include** the `b` as well.

In [8]:
x = np.linspace(0,10,50)
x

array([ 0.        ,  0.20408163,  0.40816327,  0.6122449 ,  0.81632653,
        1.02040816,  1.2244898 ,  1.42857143,  1.63265306,  1.83673469,
        2.04081633,  2.24489796,  2.44897959,  2.65306122,  2.85714286,
        3.06122449,  3.26530612,  3.46938776,  3.67346939,  3.87755102,
        4.08163265,  4.28571429,  4.48979592,  4.69387755,  4.89795918,
        5.10204082,  5.30612245,  5.51020408,  5.71428571,  5.91836735,
        6.12244898,  6.32653061,  6.53061224,  6.73469388,  6.93877551,
        7.14285714,  7.34693878,  7.55102041,  7.75510204,  7.95918367,
        8.16326531,  8.36734694,  8.57142857,  8.7755102 ,  8.97959184,
        9.18367347,  9.3877551 ,  9.59183673,  9.79591837, 10.        ])

**Question:** Why don't we get exact 0,0.2,0.4,... and instead are getting these values?

In [9]:
import math
print(round(math.pi,2))

3.14


### From Existing Arrays

Similarly, we can use slicing (as we saw earlier for the primitive Python collections) and other operations. For example:

In [10]:
npD = npC[1:4]
print(npC)
print(npD)

[ 2  4  8 16 24]
[ 4  8 16]


Nothing is exceptional here. But try to change `npD` values:

In [11]:
npD[0] = -2

print(npC)
print(npD)

[ 2 -2  8 16 24]
[-2  8 16]


**Ouch!** The change has reflected in the original array (`npC`) as well.





In case you forgot, try to do the same for the Python lists:

In [12]:
listA = [1, 2, 3, 4, 5]
listB = listA[0:2]

print("Before changing listB's first value",listA)
print("Before changing listB's value, listB is:", listB)

listB[0] = -3

print("After changing listB's first value, listA is:",listA)
print("After changing listB's value, listB is:", listB)

Before changing listB's first value [1, 2, 3, 4, 5]
Before changing listB's value, listB is: [1, 2]
After changing listB's first value, listA is: [1, 2, 3, 4, 5]
After changing listB's value, listB is: [-3, 2]


It requires some attention. Whenever we make a new (NumPy) array from an existing one, three types of copies can happen:

- **Aliasing** – The same array gets a new alias. Both names are valid.
- **Shallow Copy** – The new array refers to the original array and any change in the new array will reflect in the original one too (as we already saw).
- **Deep Copy** – New copy is truly a copy (an independent one) and any changes in it won't reflect in the existing array. It's achieved using the **`copy()`** method.

Let's illustrate it a bit with the example:

In [13]:
npD = npC
npE = npC[0:3]
npF = npC.copy()

print("Original array's id is:",id(npC))
print("Aliased array's id is:",id(npD))
print("Shallow copied array's id is:",id(npE))
print("Deep copied array's id is:",id(npF))

Original array's id is: 132042939353296
Aliased array's id is: 132042939353296
Shallow copied array's id is: 132042939358288
Deep copied array's id is: 132042939358000


We have already seen how change in the shallow copied array reflects in the original array. Let's wind it up by checking it for the deep copy.

In [14]:
print("npC before changing the deep copy",npC)

npF[2] = 6

print("npC after changing the deep copy",npC)

npC before changing the deep copy [ 2 -2  8 16 24]
npC after changing the deep copy [ 2 -2  8 16 24]


### From Files

CSV etc are common file formats and used for platform independent data sharing. We can easily import a CSV using [**`numpy.loadtxt()`**](https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html#numpy.loadtxt) as:

`np.loadtxt(<csv file name>, delimiter = <>, skiprows = x)`

where `delimiter` is `,` for CSV, tab for TSV and so on. While `x` is number of rows (header) to be skipped. If you are unsure about it, set it to 1.

> **Note:** Usually, these file handling processes are managed in a better way using the **Pandas** library, something we will see in the following notebook.

---

Now we will see NumPy's functionalities for a number of applications.

## Linear Algebra

One of the hallmark benefits of NumPy is ease/efficiency of dealing with the matrix (or vector) data. Actually, NumPy has a dedicated module (**`linalg`**) having a number of linear algebra features.

Let's cover some of them.

### Vectors

Since NumPy one-dimensional arrays are vectors, they can have some linear algebra functionalities as well. In order to keep things simple, I will talk about vectors first and will only later on generalize it for the matrices.





#### Inner Product

The inner product of two vectors is:

$$a.b = a^Tb$$

It can be calculated in NumPy using 3 different syntax:

In [15]:
import numpy as np
import numpy.linalg as la #LA as an alias of linear algebra

In [16]:
a = np.array([1,2,3])

b = np.array([4,5,6])

**Using `inner()`**

We can take the inner product using...`inner()` function (no prizes for guessing that, unfortunately).


In [17]:
c = np.inner(a,b)
c

32

**Using `dot()`**

We can also take it using the `dot()`. Its just an alias for the same function. Since inner product is commutative, so lets try it for $b.a$

In [18]:
d = np.dot(b,a)
d

32

**`@`**

If you ask my favourite syntax (as you will see onwards), its neither `inner()`, nor `dot()` but the closest to mathematics one:

In [19]:
e = a@b
e

32

#### Cross Product

Cross product is pretty interesting as it results in a vector itself.

$$a \times b = a \times b \sin(\theta)$$

In [20]:
crossProd = np.cross(a,b)

crossProd

array([-3,  6, -3])

As we can say, its a vector itself. Its not commutative, as we can see:

In [21]:
revCrossProd = np.cross(b,a)

revCrossProd

array([ 3, -6,  3])

They sound like complement of each other (i.e. each element being additive inverse of other)

#### Linear Functions

A linear function can be written in the form of:

$$f(x) = a_0x_0+a_1x_1+a_2x_2+\dots$$

There can be a number of ways of defining a linear function.



**1. Specifying the co-efficient variables**

It can be helpful if we have a linear function with just a few terms. For example:

In [22]:
linearFunc = lambda x: a*x[0]+b*x[1]+c*x[2]

Let's test it with some co-efficients and a vector:




In [23]:
a = 2
b = 12
c = 3

z = np.array([2,4, 6])

linearFunc(z)

70

**2.Specifying it as an inner product**

Previous method is just ok, but I don't find it neat. A better way would be to have it like:

$$f(x)=a^Tx = a.b$$

>**Side Note:** Usually, we assume vectors in programming as column vectors, hence $a^Tx$. It would be $ax^T$ if it were other way around, i.e, row vectors.

Trying the example above, we get the same answer:


In [24]:
linearFunc2 = lambda x:a@x

a = np.array([2,12,3])

linearFunc2(z)

70

#### Norm

A vector contains a number of elements. To get a "summary" of the vector, we have norm. It is defined as:

$$||x||_p = \sqrt{x_1^p+x_2^p+x_3^p+…} = ({x_1^p+x_2^p+x_3^p+…})^\frac{1}{p}$$

It has some special instances, like:

- **$L_1$ norm** – Putting $p$ as 1, we get:

$$|x| = |x_1+x_2+x_3+…|$$

- **$L_2$ norm** – The most common form of norm is Euclidean norm and defined as:

$$||x|| = \sqrt{x_1+x_2+x_3+…}$$

>**Note:** Norms beyond 1 are convex.

For norm calculation in NumPy, we will use **`linalg.norm()`**:



In [25]:
x = np.arange(1,10)

la.norm(x)

16.881943016134134

Lets verify it. I mean why not?!

In [26]:
import math
normLambda = lambda x: math.sqrt(sum(x**2))

In [27]:
normLambda(x)

16.881943016134134

### Matrices

Now we can use these bulding blocks to learn how to implement and use matrices in NumPy.

There can be a number of ways of making a matrix.

#### From Vectors

If we revisit a matrix:

$$ A = \begin{bmatrix}
	a & b & c \\
	d & e & f\\
	g & h & i \\
  j & k & l \\
	\end{bmatrix}
$$

It is nothing but a collection of rows or can even be treated as a collection of columns.

If we treat it as a collection of rows, it can be written as:

$$ A = \begin{bmatrix}
	a_1 \\
	a_2\\
	a_3 \\
  a_4 \\
	\end{bmatrix}
$$

Where,

$$ a_1 = \begin{bmatrix}
	a & b & c \\
  \end {bmatrix}
  $$
$$ a_2 = \begin{bmatrix}
	d & e & f \\
  \end {bmatrix}
  $$

And so on.

---

This is known as **stacking**. We can do it for collection of rows using **`vstack()`** as:



In [28]:
import numpy as np

In [29]:
a1 = np.array([1,2,3])
a2 = np.array([4, 5, 6])
a3 = np.array([7,8,9])
a4 = np.array([10,11,12])

A = np.vstack((a1,a2,a3,a4))

A

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

Similarly, we can also treat a matrix as a stacking of columns (in horizontal order, obviously).

The above example will become:

If we treat it as a collection of rows, it can be written as:

$$ A = \begin{bmatrix}
	a_1 & a_2 & a_3 \\
	\end{bmatrix}
$$

Where,

$$ a_1 = \begin{bmatrix}
	a \\
  d \\
  g \\
  j \\
  \end {bmatrix}
  $$

and

$$ a_2 = \begin{bmatrix}
	b \\
  e \\
  h \\
  k \\
  \end {bmatrix}
  $$

And so on.

We can do it using **`hstack()`**:

In [30]:
a1 = np.array([[1],[4],[7],[10]]) #Making a column vector here.

a2 = np.array([[2],[5],[8],[11]])

a3 = np.array([[3],[6],[9],[12]])

B = np.hstack((a1,a2,a3))

B

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

#### From builtin functions

As we saw above that vectors can be made using the built-in functions (like `arange()`). Similarly, we can make matrices as well using some intrinsic functions, like:

- `ones()`
- `zeros()`
- `identity()`


### Broadcasting

One of the confusing aspects of NumPy is broadcasting and hence requires a special attention.

Try it with an example first. What happens when we perform

In [31]:
A = np.array([1,2,3])

B = A + 1

B

array([2, 3, 4])

I am sure any linear algebra student/expert will find it bizarre (which it is). But lets see what happens behind the scenes:

$1$ which is a scalar or a NumPy array of size $(1,1)$ will be _stretched_ to the same size as $A$ to make them consistent.

In other words,

  $$A + 1 =
  \left[ {\begin{array}{cc}
    a_{1} \\
    a_{2}\\
    \vdots \\
    a_n
  \end{array}} \right] + \left[ {\begin{array}{cc}
    {1} \\
    1\\
    \vdots \\
    1
  \end{array}} \right] = \left[ {\begin{array}{cc}
    a_1+1 \\
    a_2+1\\
    \vdots \\
    a_n+1
  \end{array}} \right]$$

  Actually, we can write the LHS of above equation as:

  $$A + \textbf 1$$

  This is totally consistent with the Linear Algebra (for example, check _Convex Optimization, Boyd and Vandenberghe, Cambridge (2004)_) where a vector with all components one is represented as $\textbf 1$ (notice the boldface here).
  



**Why Useful?**

Broadcasting makes sure to use the vectorizing operations of NumPy (i.e. using C loops rather than Python).

**Why Its Not a Good Idea**

Its inconsistent with linear algebra and unnecessary wastage of memory. So before proceeding further, its necessary to know the bottomline: **Don't use Broadcasting if you are unsure.**

## Acknowledgements

This notebook wouldn't have been possible without:

- NumPy Documentation
- Applied Linear Algebra, Stephen Boyd (2018)