# Why NumPy and NumPy arrays?
- A way to represent multi-dimensional arrays (i.e. vectors, matrices, images, tables, tensors, etc.) in Python
- Fast operations on such multi-dimensional arrays (a.k.a vectorized operations)   
- Used by many other python packages such as pandas and astropy

   
     
In this tutorial, we will learn how to use NumPy by going through various examples from Physics and Astronomy. The focus is on learning by doing. It is not possible to cover everything in a tutorial, feel free to refer to the [numpy documentation](https://numpy.org/doc/stable/) and Google things. 
To start, lets import NumPy. The recommended way is:

In [1]:
import numpy as np



Let's start by testing numpy's speed. We can do this using the python magic command timeit. For example, without the use of numpy, how would you find the cube root of every integer between 0 and 100?

In [2]:
%%timeit

### INSTERT CODE HERE ###
for i in range(100):
    i ** (1 / 3)
#########################

8.31 µs ± 96.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


Now let's try it using NumPy. Run the example code below to time the same opperation using numpy arrays. How do the two methods compare?

In [3]:
%%timeit

arr = np.arange(0, 100)
cbrt = arr ** (1 / 3)

3.39 µs ± 17.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


Now imagine doing hundreds, thousands, or even millions of calculations. This difference will really add up! But why is numpy so fast anyways? There are a few reasons for this:
1. Arrays contain data of the same type and are stored together allowing for easy access.
  
2. NumPy breaks down calculations and computes them in parallel. 
  
3. NumPy uses precompiled C, C++, and Fortran code. These languages are much faster than python. 
  

## Representing positions of particles using arrays
### Creating an array

The position of a particle in 3D is represented by a collection of three numbers. We can create a NumPy array to store it's coordinates ($x,y,z$). Note that all elements in the array must be of the same type. This characteristic is part of what makes NumPy so fast.

In [4]:
pos_1 = np.array([1, 2, 3])

We can print out the array and check its type to see what has been stored.

In [5]:
print(pos_1)

[1 2 3]


In [6]:
type(pos_1)

numpy.ndarray

Now store the position of another particle which is located at coordinates $(4,5,6)$

In [7]:
pos_2 = np.array([4, 5, 6])         # COMPLETE THIS LINE OF CODE

NumPy takes advantage of object orriented code. NumPy arrays are Python objects and have "attributes" associated with them. Attributes are properties contained in the object. For example, the `ndim` and `shape` attributes can be used to check the number of dimensions and the shape of the array. Python objects also have "methods" which are functions that act on the object. We will discuss these later.

In [8]:
pos_1.ndim

1

In [9]:
pos_1.shape

(3,)

The above shape indicates that there's only one dimension to the array and there are 3 elements. In NumPy terminology the space of numbers required to denote the index of an element is called an `axis` and the total number of such axes is called the `dimension`.

An object which has 1 dimension is like a vector while an object with 2 dimensions is like a matrix. This idea is generalized to get an `n-dimensional` array, i.e. the location of an element in that array needs to be denoted by specifying `n` numbers (i.e. axes). Check out another array attribute, `dtype`, which tells the type of data the array holds.

In [10]:
pos_1.dtype         # COMPLETE THIS LINE OF CODE

dtype('int64')

Method example: `.astype` can be used to change the data type of an array

In [11]:
pos_1_float = pos_1.astype(np.float64)

Write in the code box below code to make the pos_1 array a float 32 dtype array

In [12]:
# COMPLETE THIS LINE OF CODE
pos_1_float32 = pos_1.astype(np.float32)

Similar to lists the `len()` function can be used to check out the length of 1-D arrays.

In [13]:
len(pos_1)                                      # COMPLETE THIS LINE OF CODE

3

### Indexing: Accessing elements of an array
Indexing a one dimensional array follows the same syntax as that of lists (i.e. the first element has an index of `0` while the last element has an index of `len(array)-1` and proceeds in steps of 1). If you want to count from the end of an array the last element has an index of `-1`, the second last `-2`, and so on.
So, for example if we want to access the $x$ coordinate of the first particle:

In [14]:
pos_1[0]

1

Calculate the sum of the $y$ and $z$ coordinates of the second particle (i.e. `pos_2`)

In [15]:
pos_2[1] + pos_2[2] # COMPLETE THIS LINE OF CODE

11

**BONUS:** Can you guess what would happen if we added the two arrays without accessing elements? Try it below!

In [16]:
## INSERT CODE HERE ##
pos_2 + pos_2

array([ 8, 10, 12])

### Automatically generating arrays
Sometimes it is useful to automatically generate an array of a given length. Some common ways to do these are the functions: `np.ones`, `np.zeros`, `np.arange`, `np.linspace` and `np.logspace`. Let's check the documentation for `np.zeros` and `np.ones` by typing `?` after the function name:  
(Also when using jupyter the documentation can also be accessed by pressing `Shift`+`tab` after typing the function name, i.e. `np.zeros` -> `Shift`+`tab`) 

In [17]:
np.ones?

[0;31mSignature:[0m [0mnp[0m[0;34m.[0m[0mones[0m[0;34m([0m[0mshape[0m[0;34m,[0m [0mdtype[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0morder[0m[0;34m=[0m[0;34m'C'[0m[0;34m,[0m [0;34m*[0m[0;34m,[0m [0mlike[0m[0;34m=[0m[0;32mNone[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Return a new array of given shape and type, filled with ones.

Parameters
----------
shape : int or sequence of ints
    Shape of the new array, e.g., ``(2, 3)`` or ``2``.
dtype : data-type, optional
    The desired data-type for the array, e.g., `numpy.int8`.  Default is
    `numpy.float64`.
order : {'C', 'F'}, optional, default: C
    Whether to store multi-dimensional data in row-major
    (C-style) or column-major (Fortran-style) order in
    memory.
like : array_like, optional
    Reference object to allow the creation of arrays which are not
    NumPy arrays. If an array-like passed in as ``like`` supports
    the ``__array_function__`` protocol, the result will

**BONUS:** Can you make an array that has one hundred elements all equal to 5?

In [18]:
## INSERT CODE HERE ##
np.ones(100) * 5

array([5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5.,
       5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5.,
       5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5.,
       5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5.,
       5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5.,
       5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5.])

`np.arange` returns evenly spaced values within a given interval. Let's check the documentation for `np.arange`.

In [19]:
np.arange?

[0;31mDocstring:[0m
arange([start,] stop[, step,], dtype=None, *, like=None)

Return evenly spaced values within a given interval.

``arange`` can be called with a varying number of positional arguments:

* ``arange(stop)``: Values are generated within the half-open interval
  ``[0, stop)`` (in other words, the interval including `start` but
  excluding `stop`).
* ``arange(start, stop)``: Values are generated within the half-open
  interval ``[start, stop)``.
* ``arange(start, stop, step)`` Values are generated within the half-open
  interval ``[start, stop)``, with spacing between values given by
  ``step``.

For integer arguments the function is roughly equivalent to the Python
built-in :py:class:`range`, but returns an ndarray rather than a ``range``
instance.

When using a non-integer step, such as 0.1, it is often better to use
`numpy.linspace`.


Parameters
----------
start : integer or real, optional
    Start of interval.  The interval includes this value.  The default
    st

How can we use np.arange to generate an array whose elements are between 0 to 100 and increase in steps of 10?

In [20]:
## INSERT CODE HERE ##
np.arange(0,100,10)

array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

Now look up the documentation for `np.linspace` and generate the same array we generated above using `np.linspace`. What is the difference between `np.arange` and `np.linspace`?

In [21]:
## INSERT CODE HERE ##
np.linspace(0, 90, 20)

array([ 0.        ,  4.73684211,  9.47368421, 14.21052632, 18.94736842,
       23.68421053, 28.42105263, 33.15789474, 37.89473684, 42.63157895,
       47.36842105, 52.10526316, 56.84210526, 61.57894737, 66.31578947,
       71.05263158, 75.78947368, 80.52631579, 85.26315789, 90.        ])

**BONUS:** Lookup the documentation for `np.logspace` and create an array using this function. When might this type of array be useful?

In [22]:
## INSERT CODE HERE ##
np.logspace?

[0;31mSignature:[0m
[0mnp[0m[0;34m.[0m[0mlogspace[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mstart[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mstop[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mnum[0m[0;34m=[0m[0;36m50[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mendpoint[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mbase[0m[0;34m=[0m[0;36m10.0[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdtype[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0maxis[0m[0;34m=[0m[0;36m0[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Return numbers spaced evenly on a log scale.

In linear space, the sequence starts at ``base ** start``
(`base` to the power of `start`) and ends with ``base ** stop``
(see `endpoint` below).

.. versionchanged:: 1.16.0
    Non-scalar `start` and `stop` are now supported.

Parameters
----------
start : array_like
    ``base ** start`` is the starting

In [23]:
np.logspace(0, 2, 10)


array([  1.        ,   1.66810054,   2.7825594 ,   4.64158883,
         7.74263683,  12.91549665,  21.5443469 ,  35.93813664,
        59.94842503, 100.        ])

**Random Arrays:** We can also generate arrays whose elements are random numbers following a specific distribution. The `np.random` module contains a number of functions that can be used to this effect. The following will create a one dimensional array with 5 samples drawn from a standard normal distribution. More such functions can be found in the [documentation](https://numpy.org/doc/stable/reference/random/index.html).

In [24]:
from numpy.random import default_rng
rng = default_rng()
rng.normal(size=5)

array([0.57620624, 0.15962392, 0.3886461 , 0.05678079, 0.50858692])

### Slicing: Extracting chunks of an array

We can extract smaller arrays from a longer array, the syntax is `array[start_index:stop_index]`. An important note is that the returned array will **include** the element corresponding to the `start_index` and **exclude** the element corresponding to the `stop_index`, i.e. it will end at `stop_index-1`. For example, lets generate an array with 100 elements

In [25]:
big_array = np.arange(100)

**NOTE:** In addition to the usage of `np.arange` shown previously, if you put only an integer as the argument of `np.arange`, it will return an array of integers starting from `0` and of length equal to the argument.  

To get a smaller slice of `big_array` beginning at the index `20` and ending at the index `50-1` we do:

In [26]:
big_array[20:50]

array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
       37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])

In [27]:
big_array[20:50:2]

array([20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48])

The syntax is `array[start_index:stop_index:step]` can be used to get a slice of an array where the elements are selected in units of `step`. If no `step` is provided, it is assumed to be 1 and hence all elements are returned in the given range.    

So, if we want to extract the sequence `[65, 68, 71, 74, 77, 80, 83]` from `big_array`, what should we do? 

In [28]:
big_array[65:86:3]    # COMPLETE THIS LINE OF CODE

array([65, 68, 71, 74, 77, 80, 83])

**BONUS:** What happens when we change the `stop_index` above to `85`? What about `87`? Why?

In [29]:
## INSERT CODE HERE ##
print(big_array[65:85:3])
print(big_array[65:87:3])

[65 68 71 74 77 80 83]
[65 68 71 74 77 80 83 86]


**Note:** Doing `big_array[:]` is equivalent to selecting the whole array. The elements of an array can be reversed by selecting the whole array and having a step of `-1`. Try it below!

In [30]:
### INSERT CODE HERE ##
big_array[::-1]

array([99, 98, 97, 96, 95, 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84, 83,
       82, 81, 80, 79, 78, 77, 76, 75, 74, 73, 72, 71, 70, 69, 68, 67, 66,
       65, 64, 63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50, 49,
       48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32,
       31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15,
       14, 13, 12, 11, 10,  9,  8,  7,  6,  5,  4,  3,  2,  1,  0])

What if you have each element of an array be a list? How would you access the number 2 in the array below?

In [31]:
a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

In [32]:
### INSERT CODE HERE ##
a[0][1]
#Another way to do it would be to use the following code:
a[0,1]


2

Both of the methods above are saying the same thing. The syntax `array[which_list:which_value]` is picking one of the lists and then taking one of the the numbers in the list from the array. Remember, python indexes begin at `zero`

## Numerical operations on arrays  
### Element wise binary operations

All Python binary operations i.e. `+`,`-`,`*`,`/`and`**` work on arrays too! Note that these operators are aliases for numpy functions. For example, we can write 

In [33]:
3 * pos_1

array([3, 6, 9])

which is equivalent to

In [34]:
np.multiply(3, pos_1)

array([3, 6, 9])

Now try squaring each coordinate of particle 2. Try using the numpy function as well as it's alias to complete the operation.

In [35]:
## INSERT CODE HERE ##
print(pos_2 ** 2)
print(np.power(pos_2, 2))

[16 25 36]
[16 25 36]


All these operations are done element wise on NumPy arrays. What does element wise mean?

Type answer by double clicking on this text:
  
  

If two arrays are of **same shape** then all these binary operations are performed between elements in the same index for both the arrays. For example let's add the positions of two vectors (*note that we performed this operation in an earlier bonus question*).

In [36]:
pos_1 + pos_2

array([5, 7, 9])

What happens if we exponentiate the elements of the first array to the power of the elements of the second array?  

In [37]:
## INSERT CODE HERE ##
pos_1 ** pos_2

array([  1,  32, 729])

What happens if we try to do such operations between arrays of different shapes?  
A short answer is it may or may not give you an error!! We will look into this in the next part of the tutorial.  
  
**BONUS:** Try this out below.

In [38]:
## INSERT CODE HERE
arr_1 = np.array([1, 2, 3])
arr_2 = np.array([4, 5])
arr_1 ** arr_2

ValueError: operands could not be broadcast together with shapes (3,) (2,) 