# Numpy


> <font size=+1>Numpy is a scientific standard for handling multidimensional data in Python</font>

__This is one of the most if not the most important library for Python__, because:
- It is a core of many scientific stacks:
    - Underlying library for __Pandas__ (we will learn about it later)
    - API parity (or similarity) with __PyTorch__ or __Tensorflow__, two of the main Deep Learning libraries for Python
    - Many third party libraries implement ideas we will see here, such as dimensionality
    
Its popularity could be attributed to a few key traits:
- Ease of use
- Efficiency: Numpy is built on top of C (Python acts as a front-end)
- Intuitive syntax
- It "just works" as you'd expect (and would like it to)


## Installation


We can install Numpy really easily via `pip` or `conda` (available in the main `conda` channel) via:

```bash
pip install numpy
```

```
conda install numpy
```

In [2]:
!pip install numpy

You should consider upgrading via the '/Library/Frameworks/Python.framework/Versions/3.9/bin/python3.9 -m pip install --upgrade pip' command.[0m


Once installed, the canonical way to import it in Python is giving it the alias `np`, so it will look like:

`import numpy as np`

Let's take a look at the some of the most common elements you will find in Numpy

## np.ndarray


> <font size=+1>`np.ndarray` is highly-efficient data abstraction written in C with Python's bindings for easier usage</font>

Important traits about `np.ndarray`:
- Can have arbitrary number of dimensions
- Single `dtype`__ (type of data), usually numeric, for example:
    - `float32` (a.k.a. `float`)
    - `float64` (a.k.a. `double`); __default__
    - `int32` (a.k.a. `int`)
- __Has to be "rectangle-like"__:
    - We cannot have `3` lists of different sizes in a single `np.ndarray`

You can generate ndarrays using this method. However, it might not be very intuitive at the beginning, since we need to pass the dimension of the matrix we want to generate, and `ndarray` will populate it with random numbers

In [1]:
import numpy as np

nd_array1 = np.ndarray((2, 2))
print(nd_array1)

[[-1.49166815e-154 -2.68678217e+154]
 [ 9.88131292e-324  2.78134232e-309]]


To create `ndarrays` out of an object we already have, for example, a list, we can use the `array` method


### np.array vs np.ndarray


> __`np.array` IS A FACTORY METHOD which creates `np.ndarray` (numpy N-dimensional array) objects__

What is a factory method?

> Factory methods are methods which, dependent on the input we pass to it __returns different object types__

Let's see:
- How to create `np.ndarray` object from Python's objects (`list` and `tuple`)
- How the type is inferred based on content
- Uniform presentation of arrays on Python level (`type(array)`)

> __You should always use `np.array` in order to create an array because it infers `dtype` correctly!__

In [2]:
import numpy as np  # always use this alias!

# Defining arrays
arr1d_int = np.array([1, 2, 3, 4])
arr2d_float = np.array(((1, 2, 3, 4), (5, 6, 7, 8.0)))  # Notice 8.0

print(arr1d_int)
print(arr2d_float)
arr1d_int.dtype, type(arr1d_int), arr2d_float.dtype, type(arr2d_float)

[1 2 3 4]
[[1. 2. 3. 4.]
 [5. 6. 7. 8.]]


(dtype('int64'), numpy.ndarray, dtype('float64'), numpy.ndarray)

Notice that, just by adding a float to the array, the whole array now contains solely floats.

### Changing data type


Sometimes, we may find the need to use a data type different from the one automatically inferred by NumPy. There are two fundamental approaches to achieve this:

1. Specifying During Creation

   - You can explicitly specify the desired data type when creating a NumPy array. This allows you to control the type of data stored in the array from the outset.

2. Casting via `.astype`

    - Another method involves casting, which is the process of converting the data type of an existing array. This is done using the `.astype` method. It's crucial to note that casting results in the creation of a new array, and it is not performed in-place. The creation of a new array is necessary due to the potential differences in size and structure between the original and desired data types.

Now, let's see casting process:

In [3]:
# Failed attempt, new array returned 
arr1d_int.astype("int8")
print(arr1d_int.dtype)

# Correct way, new object is assigned to itself
arr1d_int = arr1d_int.astype("int8")

arr1d_int.dtype

int64


dtype('int8')

In [4]:
# We can also specify it as `np.TYPE` object
new_arr = np.array([1, 2, 3], dtype=np.int8) # or "int8" string

## Data layout

> __`np.ndarray` is kept in memory as `1D` array of contiguous values__

If so, how can we have, for example, `3D` array? Numpy has everything stored in a "single line", but it has an attribute called _stride_ that helps to know how the data is distributed.

### strides

> __Strides define HOW MANY BYTES one need to traverse in order to get next element for each dimension__

<p align=center><img src=images/numpy_memory_layout.png width=600></p>

<p align=center><img src=images/numpy_strides.svg width=600></p>

Let's see what these are for our two arrays:

In [5]:
print(
    f"""Int1D itemsize: {arr1d_int.itemsize}
Int1D strides: {arr1d_int.strides}
Float2D itemsize: {arr2d_float.itemsize}
Float2D strides: {arr2d_float.strides}
    """
)

Int1D itemsize: 1
Int1D strides: (1,)
Float2D itemsize: 8
Float2D strides: (32, 8)
    


- `itemsize` - specifies how many bytes are used for the data type
- `stride` - specifies how many bytes we have to jump in order to move to the next element

In [6]:
# Explain values below based on the code and output

arr = np.arange(9).reshape(3, 3)

print(arr)
print(f'The data type of each element is: {arr.dtype}')
print(f'The length of each element in bytes is: {arr.itemsize}')
print(f'The strides of the data types is: {arr.strides}')

[[0 1 2]
 [3 4 5]
 [6 7 8]]
The data type of each element is: int64
The length of each element in bytes is: 8
The strides of the data types is: (24, 8)


Makes sense right? The second element in the tuple is the amount of bytes we need to "move to the right" and the first element is the number of bytes we need to "move to the next row"

We can also transposed our array, let's see how this changes our strides

<p align=center><img src=images/numpy_strides_transposed.svg width=600></p>

__Take note that__:
- Our internal data was "moved" around
- __Why would we need it, wouldn't change in strides suffice?__

In [7]:
transposed = arr.T

print(transposed)
transposed.strides

[[0 3 6]
 [1 4 7]
 [2 5 8]]


(8, 24)

## shape


> `<our_array>.shape` returns dimensionality of `<our_array>`

It is one of the most often used attributes in `numpy` and scientific computing so keep that in mind!


## Creating `np.ndarray`s


Numpy allows us to easily create data in multiple ways, namely:
- __From standard Python structures (`list`s or `tuple`s)__ (possibly nested)
- __Direct creation of `np.ndarray`__ via:
    - random operations (elements are taken from some distribution)
    - using single value (zeros, ones, `eye` with some value)
    
Let's see a few creation operations (__all of them are listed [here](https://numpy.org/doc/stable/reference/routines.array-creation.html)__). Usually, the arguments we pass to them is the dimensions we want to give to the matrix

In [21]:
ones = np.ones((3, 2)) # 2D matrix filled with ones
zeros = np.zeros_like(ones) # 2D zero matrix filled with zeros of the same shape as ones and zeros
identity = np.eye(3)

print(ones)
print(f'Shape of "one" is: {ones.shape}')
print(zeros)
print(f'Shape of "zeros" is: {zeros.shape}')
print(identity)
print(f'Shape of "identity" is: {identity.shape}')


[[1. 1.]
 [1. 1.]
 [1. 1.]]
Shape of "one" is: (3, 2)
[[0. 0.]
 [0. 0.]
 [0. 0.]]
Shape of "zeros" is: (3, 2)
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
Shape of "identity" is: (3, 3)


## Creating random np.array


> `numpy` provides means to create random arrays (for example defined by some distribution)

[Here](https://numpy.org/doc/stable/reference/random/index.html) you can see a full list of possibilities,
__all of them are located in `random` module__.

Example usage:

In [54]:
vals = np.random.standard_normal(10)

vals

array([-0.24254289,  0.7083943 , -0.33446633,  0.42426222, -1.99369232,
        0.80625689, -0.54320899,  2.36829641, -0.20734505, -0.59212398])

In [58]:
# Random NORMAL distribution (mean: 0 and stddev: 1)
vals = np.random.randn(3, 4)

vals

array([[-0.74751218,  0.09683241,  0.48627724, -0.60063956],
       [ 1.47518089,  2.11843449, -1.02584517,  0.19524806],
       [ 1.11244104, -0.59861959,  0.61149984,  0.67560329]])

In [59]:
# Random UNIFORM distribution (0, 1 range)
vals = np.random.rand(3, 4)

vals

array([[0.98468015, 0.37304705, 0.53834982, 0.33175601],
       [0.3772029 , 0.5516284 , 0.16610994, 0.2366541 ],
       [0.50624113, 0.64051987, 0.27737038, 0.32808795]])

# Key Takeaways

- Numpy is one of the most important and useful Python libraries
- NumPy allows performing various mathematical operations on arrays easily 
- `ndarray` is a highly efficient data abstraction used by Python to store and manipulate data in array structures
- `shape` returns the dimensionality of an array
- `stride` helps to know how the data is distributed in a Python array
- Numpy provides different ways to slice the arrays to get the data we need