# Understanding `Numpy` : Arrays and Vectorized Computation
NumPy, short for Numerical Python, is one of the most important foundational pack‐ ages for numerical computing in Python.NumPy is a Python library that provides a simple yet powerful data structure: the n-dimensional array.  Most computational packages providing scientific functionality use NumPy’s array objects as the *`lingua franca`* for data exchange.

### `Here are the top four benefits that NumPy can bring to your code:`

- More speed: NumPy uses algorithms written in C that complete in nanoseconds rather than seconds.
- Fewer loops: NumPy helps you to reduce loops and keep from getting tangled up in iteration indices.
- Clearer code: Without loops, your code will look more like the equations you’re trying to calculate.
- Better quality: There are thousands of contributors working to keep NumPy fast, friendly, and bug free.

### `Here are some of the things you’ll find in NumPy:`
- ndarray, an efficient multidimensional array providing fast array-oriented arith‐ metic operations and flexible broadcasting capabilities.
- Mathematical functions for fast operations on entire arrays of data without hav‐
ing to write loops.
- Tools for reading/writing array data to disk and working with memory-mapped files.
- Linear algebra, random number generation, and Fourier transform capabilities.
- A C API for connecting NumPy with libraries written in C, C++, or FORTRAN

### `One of the reasons NumPy is so important for numerical computations in Python is because it is designed for efficiency on large arrays of data. There are a number of reasons for this:`
- NumPy internally stores data in a contiguous block of memory, independent of other built-in Python objects. NumPy’s library of algorithms written in the C language can operate on this memory without any type checking or other overhead. NumPy arrays also use much less memory than built-in Python sequences.
- NumPy operations perform complex computations on entire arrays without the need for Python for loops.


## Importing numpy
`import numpy as np`

In [2]:
import numpy as np
my_arr = np.arange(1000000)
my_list = list(range(1000000))

In [None]:
# Numpy array multiplication time
%time for _ in range(10): my_arr2 = my_arr * 2

In [None]:
# Python list multiplication time
%time for _ in range(10): my_list2 = [x * 2 for x in my_list]

### NumPy-based algorithms are generally 10 to 100 times faster (or more) than their pure Python counterparts and use significantly less memory.

## 6.1 The NumPy ndarray: A Multidimensional Array Object
One of the key features of NumPy is its **N-dimensional array object**, or **`ndarray`**, which is a fast, flexible container for large datasets in Python. Arrays enable you to `perform mathematical operations on whole blocks of data using similar syntax` to the equivalent operations between scalar elements.

In [None]:
# Generate some random data
data = np.random.randn(2, 3)
data

In [None]:
# Simple mathematical operations with data
data * 10

In [None]:
# Simple mathematical operations with data
data + data/2

### `ndarray:` An ndarray is a generic multidimensional container for `homogeneous data`; that is, `all of the elements must be the same type`. Every array has a `shape`, a tuple indicating the size of each dimension, and a `dtype`, an object describing the data type of the array:

In [None]:
# Data dimension
data.shape

In [None]:
# Data type
data.dtype

## A. Creating ndarrays
The easiest way to create an array is to use the array function. This accepts any sequence-like object (including other arrays) and produces a new NumPy array con‐ taining the passed data. 

In [None]:
data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)
arr1

`Nested sequences`, like a list of equal-length lists, will be converted into a multidimensional array

In [None]:
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)
arr2

Since data2 was a list of lists, the NumPy array arr2 has two dimensions with shape inferred from the data. We can confirm this by inspecting the `ndim` and `shape` attributes:

In [None]:
arr2.ndim

In [None]:
arr2.shape

#### Unless explicitly specified (more on this later), np.array tries to infer a good data type for the array that it creates. The data type is stored in a special dtype metadata object; for example, in the previous two examples we have

In [None]:
arr1.dtype

In [None]:
arr2.dtype

### Creating ndArray with a perticular value
In addition to np.array, there are a number of other functions for creating new
arrays. As examples, `zeros` and `ones` create arrays of 0s or 1s, respectively, with a
given length or shape. `empty` creates an array without initializing its values to any par‐
ticular value.

To create a higher dimensional array with these methods, pass a `tuple` for the shape

In [None]:
np.zeros(10)

In [None]:
np.zeros((3, 6))

In [None]:
 np.empty((2, 3, 2))

### Important:
> `It’s not safe to assume that np.empty will return an array of all zeros. In some cases, it may return uninitialized “garbage” values`

### Creating array with `np.arange`: 
arange is an array-valued version of the built-in Python range function.

In [None]:
np.arange(15)

## Table A: Array creation functions:
![Array functions](img\Array_creation_functions.png)

## B. Data Types for ndarrays
The data type or dtype is a special object containing the information (or metadata,
data about data) the ndarray needs to interpret a chunk of memory as a particular
type of data:

In [None]:
arr1 = np.array([1, 2, 3], dtype=np.float64)
arr2 = np.array([1, 2, 3], dtype=np.int32)

In [None]:
arr1.dtype

In [None]:
arr2.dtype

### Table B: NumPy data types :
![NumPy data types](img\NumPy_dat_types.png)

`The numerical dtypes are named the same way: a type name, like float or int, followed by a number indicating the number of bits per element. A standard doubleprecision floating-point value (what’s used under the hood in Python’s float object) takes up 8 bytes or 64 bits. Thus, this type is known in NumPy as float64.`

### You can explicitly convert or cast an array from one dtype to another using ndarray’s `astype` method

In [None]:
arr = np.array([1, 2, 3, 4, 5])
arr.dtype

In [None]:
float_arr = arr.astype(np.float64)
float_arr.dtype

In [None]:
# If I cast some floating-point numbers to be of integer dtype, the decimal part will be **truncated**
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
print(arr)

In [None]:
arr.astype(np.int32)

In [None]:
# If you have an array of strings representing numbers, you can use astype to convert them to numeric form
numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_)
numeric_strings.astype(float)

`It’s important to be cautious when using the numpy.string_ type, as string data in NumPy is fixed size and may truncate input without warning. pandas has more intuitive out-of-the-box behav‐ ior on non-numeric data.`

### B. Arithmetic with NumPy Arrays
Arrays are important because they enable you to express batch operations on data
without writing any for loops. NumPy users call this **`vectorization`**. Any arithmetic
operations between equal-size arrays applies the operation element-wise:

In [None]:
arr = np.array([[1., 2., 3.], [4., 5., 6.]])
arr

In [None]:
arr * arr

In [None]:
arr - arr

In [None]:
## Arithmetic operations with scalars propagate the scalar argument to each element in the array
1 / arr

In [None]:
arr ** 0.5

In [None]:
## Comparisons between arrays of the same size yield **boolean arrays**:
arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])
arr2

In [None]:
# Element wise array comparisons
arr2 > arr

## C. Basic Indexing and Slicing
NumPy array indexing is a rich topic, as there are many ways you may want to select
a subset of your data or individual elements. One-dimensional arrays are simple; on
the surface they act similarly to Python lists:

In [None]:
arr = np.arange(10)
arr

In [None]:
arr[5]

In [None]:
arr[5:8]

In [None]:
arr[5:8] = 0
arr

### Important:
> `An important first distinction from Python’s built-in lists is that array slices are views on the original array. This means that the data is not copied, and any modifications to the view will be reflected in the source array`

In [None]:
arr_slice = arr[5:8]
arr_slice

In [None]:
arr_slice[1] = 1
arr

In [None]:
# The “bare” slice [:] will assign to all values in an array:
arr_slice[:] = 1
arr

### Important:
> `If you want a copy of a slice of an ndarray instead of a view, you will need to explicitly copy the array—for example, arr[5:8].copy().`

With higher dimensional arrays, you have many more options. In a two-dimensional array, the elements at each index are no longer scalars but rather one-dimensional arrays:

In [None]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d[2]

Thus, individual elements can be accessed recursively. But that is a bit too much work, so you can pass a comma-separated list of indices to select individual elements. So these are equivalent:

In [None]:
arr2d[0][2], arr2d[0, 2]

In multidimensional arrays, if you omit later indices, the returned object will be a lower dimensional ndarray consisting of all the data along the higher dimensions. 

So in the 2 × 2 × 3 array arr3d:

In [None]:
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
arr3d

In [None]:
arr3d.shape

In [None]:
arr3d.ndim

In [None]:
arr3d[0]

In [None]:
## Both scalar values and arrays can be assigned to arr3d[0]:
old_values = arr3d[0].copy()
arr3d[0] = 42
arr3d

In [None]:
arr3d[0] = old_values
arr3d

In [None]:
## Similarly, arr3d[1, 0] gives you all of the values whose indices start with (0, 0), forming a 1-dimensional array:
arr3d[0, 0]

### C. Indexing with slices
Like one-dimensional objects such as Python lists, ndarrays can be sliced with the familiar syntax:

In [52]:
arr

array([0, 1, 2, 3, 4, 1, 1, 1, 8, 9])

In [54]:
arr[1:6]

array([1, 2, 3, 4, 1])

### In case arragy 2D `arr2d`. Slicing this array is a bit different

In [3]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [4]:
arr2d[2]

array([7, 8, 9])

### Thus, individual elements can be accessed recursively. But that is a bit too much work, so you can pass a comma-separated list of indices to select individual elements.

In [5]:
arr2d[0][2]

3

In [6]:
arr2d[0, 2]

3

In [7]:
#  the expression arr2d[:2] as “select the first two rows of arr2d.
arr2d[:2]

array([[1, 2, 3],
       [4, 5, 6]])

In [8]:
## You can pass multiple slices just like you can pass multiple indexes:
arr2d[:2, 1:]

array([[2, 3],
       [5, 6]])

### Two-dimensional array slicing
![Two-dimensional array slicing](img\slicing.png)

In [9]:
### You can pass multiple slices just like you can pass multiple indexes
arr2d[:2, 1:]

array([[2, 3],
       [5, 6]])

In [10]:
## can select the second row but only the first two columns like so:
arr2d[1, :2]

array([4, 5])

In [None]:
## can select the third column but only the first two rows like so:
arr2d[:2, 2]

In [12]:
# All row and till first column
arr2d[:, :1]

array([[1],
       [4],
       [7]])

### D. Boolean Indexing

In [14]:
data = np.random.randn(7, 4)
data

array([[-0.30564727, -0.82815069, -0.3585549 , -1.01165424],
       [-0.36882545,  2.20616909,  0.09366191,  0.69502366],
       [ 2.34428044, -1.08138719,  0.20795686, -1.06394319],
       [ 0.80774488, -0.49844516,  0.36547331,  1.02033724],
       [ 0.04856309, -0.4382345 , -2.11257395,  0.22305587],
       [ 0.47657487, -0.60130422,  1.0625597 , -0.03285256],
       [-0.33903704, -0.3196372 ,  0.09422315, -0.38962136]])

In [18]:
data < 0.5

array([-0.30564727, -0.82815069, -0.3585549 , -1.01165424, -0.36882545,
        0.09366191, -1.08138719,  0.20795686, -1.06394319, -0.49844516,
        0.36547331,  0.04856309, -0.4382345 , -2.11257395,  0.22305587,
        0.47657487, -0.60130422, -0.03285256, -0.33903704, -0.3196372 ,
        0.09422315, -0.38962136])

In [19]:
data[data < 0.5]

array([-0.30564727, -0.82815069, -0.3585549 , -1.01165424, -0.36882545,
        0.09366191, -1.08138719,  0.20795686, -1.06394319, -0.49844516,
        0.36547331,  0.04856309, -0.4382345 , -2.11257395,  0.22305587,
        0.47657487, -0.60130422, -0.03285256, -0.33903704, -0.3196372 ,
        0.09422315, -0.38962136])

In [27]:
data[data < 0] = 0
data

array([[0.        , 0.        , 0.        , 0.        ],
       [0.        , 2.20616909, 0.09366191, 0.69502366],
       [2.34428044, 0.        , 0.20795686, 0.        ],
       [0.80774488, 0.        , 0.36547331, 1.02033724],
       [0.04856309, 0.        , 0.        , 0.22305587],
       [0.47657487, 0.        , 1.0625597 , 0.        ],
       [0.        , 0.        , 0.09422315, 0.        ]])

In [28]:
 names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
 names

array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], dtype='<U4')

In [22]:
 names == 'Bob'

array([ True, False, False,  True, False, False, False])

In [29]:
data[names == 'Bob']

array([[0.        , 0.        , 0.        , 0.        ],
       [0.80774488, 0.        , 0.36547331, 1.02033724]])

In [30]:
data[~(names == 'Bob')]

array([[0.        , 2.20616909, 0.09366191, 0.69502366],
       [2.34428044, 0.        , 0.20795686, 0.        ],
       [0.04856309, 0.        , 0.        , 0.22305587],
       [0.47657487, 0.        , 1.0625597 , 0.        ],
       [0.        , 0.        , 0.09422315, 0.        ]])

In [31]:
# Passing multiple index arrays does something slightly different; it selects a onedimensional array of elements corresponding to each tuple of indices:
arr = np.arange(32)
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31])

In [32]:
arr.reshape((8,4))

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [34]:
arr.reshape((4,8))

array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29, 30, 31]])

### Transposing Arrays and Swapping Axes :
Transposing is a special form of reshaping that similarly returns a view on the underlying data without copying anything. Arrays have the `transpose` method and also the special `T` attribute:



In [35]:
arr = np.arange(15).reshape((3, 5))
arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [36]:
arr.T

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

In [38]:
##  Matrix Dot products
arr = np.random.randn(6, 3)
arr

array([[ 0.29920037,  1.42590106,  0.94982085],
       [-0.90167327,  1.1536556 ,  0.00407251],
       [ 0.03507617,  0.28316696,  0.20702111],
       [ 0.54399925,  0.89925131,  2.71311655],
       [-1.3884297 ,  0.8828471 ,  0.55419584],
       [ 1.20927137, -0.53326038,  1.16056399]])

In [39]:
 np.dot(arr.T, arr)

array([[ 4.58977534, -1.98509349,  2.39768437],
       [-1.98509349,  5.31673715,  3.72783138],
       [ 2.39768437,  3.72783138,  9.96007718]])

### For higher dimensional arrays, transpose will accept a tuple of axis numbers to permute the axes :


In [71]:
arr = np.arange(16).reshape((2, 2, 4))
arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [74]:
## the axes have been reordered with the second axis first, the first axis second, and the last axis unchanged.
arrT = arr.transpose((1, 0, 2))
arrT

array([[[ 0,  1,  2,  3],
        [ 8,  9, 10, 11]],

       [[ 4,  5,  6,  7],
        [12, 13, 14, 15]]])

In [75]:
arr.transpose((2, 1, 0))

array([[[ 0,  8],
        [ 4, 12]],

       [[ 1,  9],
        [ 5, 13]],

       [[ 2, 10],
        [ 6, 14]],

       [[ 3, 11],
        [ 7, 15]]])

### Simple transposing with .T is a special case of swapping axes. ndarray has the method `swapaxes`, which takes a pair of axis numbers and switches the indicated axes to re-arrange the data:

In [44]:
arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [45]:
arr.swapaxes(1, 2)

array([[[ 0,  4],
        [ 1,  5],
        [ 2,  6],
        [ 3,  7]],

       [[ 8, 12],
        [ 9, 13],
        [10, 14],
        [11, 15]]])

## Universal Functions: Fast Element-Wise Array Functions

A universal function, or `ufunc`, is a function that performs `element-wise` operations on data in ndarrays. You can think of them as fast vectorized wrappers for simple functions that take one or more scalar values and produce one or more scalar results.

![Two-dimensional array slicing](img\unnfunc.png)


In [47]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [48]:
np.sqrt(arr)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

In [49]:
np.exp(arr)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

In [76]:
## These are referred to as unary ufuncs. Others, such as add or maximum, take two arrays (thus, binary ufuncs) and return a single array as the result:
x = np.random.randn(8)
y = np.random.randn(8)
x,y

(array([-1.64655176, -0.31236839, -0.48095293, -1.14043338, -0.47828168,
         0.50830927, -2.72043313, -0.0185905 ]),
 array([ 0.23654108, -1.38170205,  0.30001076, -0.00212557, -0.22360332,
         1.17150395, -0.31177258, -1.11254394]))

###  numpy.maximum computed the element-wise maximum of the elements in x and y.

In [77]:
np.maximum(x, y)

array([ 0.23654108, -0.31236839,  0.30001076, -0.00212557, -0.22360332,
        1.17150395, -0.31177258, -0.0185905 ])

In [79]:
arr = np.random.randn(7) * 5
arr

array([ -2.44133811, -15.6534694 ,  -1.7969817 ,   2.38587568,
         1.50153421,   1.77276303,   3.63828408])

In [80]:
remainder, whole_part = np.modf(arr)

In [81]:
remainder

array([-0.44133811, -0.6534694 , -0.7969817 ,  0.38587568,  0.50153421,
        0.77276303,  0.63828408])

In [82]:
whole_part

array([ -2., -15.,  -1.,   2.,   1.,   1.,   3.])

## Mathematical and Statistical Methods :
![Basic array statistical methods](img\statfunc.png)

A set of mathematical functions that compute statistics about an entire array or about the data along an axis are accessible as methods of the array class. You can use aggregations (often called reductions) like sum, mean, and std (standard deviation)either by calling the array instance method or using the top-level NumPy function.

In [84]:
arr = np.random.randn(5, 4)
arr

array([[ 0.69229252, -1.47758345, -0.47343072, -1.3323851 ],
       [ 1.04630701, -0.36105089, -2.16458479,  0.07371429],
       [ 1.79113818, -0.54600214, -1.10558803, -1.12059709],
       [-0.08264508, -0.71761574,  0.49746228, -0.92452492],
       [-0.24003681,  1.137565  , -1.17819985,  1.51814028]])

In [85]:
arr.mean()

-0.24838125336066055

In [86]:
np.mean(arr)

-0.24838125336066055

In [87]:
arr.sum()

-4.967625067213211

### Functions like mean and sum take an optional axis argument that computes the statistic over the given axis, resulting in an array with one fewer dimension:


In [92]:
# compute mean across the columns wise
arr.mean(axis=1)

array([-0.64777669, -0.3514036 , -0.24526227, -0.30683087,  0.30936716])

In [93]:
 # compute sum down the rows
 arr.sum(axis=0)

array([ 3.20705582, -1.96468723, -4.42434112, -1.78565254])

###  `cumulative sum`: `cumsum` and `cumprod` do not aggregate, instead producing an array of the intermediate results:

In [94]:
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7])
arr.cumsum()

array([ 0,  1,  3,  6, 10, 15, 21, 28], dtype=int32)

### Multidimensional arrays - cumsum
In multidimensional arrays, accumulation functions like cumsum return an array of the same size, but with the partial aggregates computed along the indicated axis according to each lower dimensional slice:

In [95]:
arr = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
arr

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [96]:
arr.cumsum(axis=0)

array([[ 0,  1,  2],
       [ 3,  5,  7],
       [ 9, 12, 15]], dtype=int32)

In [97]:
arr.cumprod(axis=1)

array([[  0,   0,   0],
       [  3,  12,  60],
       [  6,  42, 336]], dtype=int32)

### Linear Algebra :
Linear algebra, like matrix multiplication, decompositions, determinants, and other
square matrix math, is an important part of any array library. Unlike some languages
like MATLAB, multiplying two two-dimensional arrays with * is an element-wise
product instead of a matrix dot product. Thus, there is a function dot, both an array
method and a function in the numpy namespace, for matrix multiplication:

In [3]:
x = np.array([[1., 2., 3.], [4., 5., 6.]])
y = np.array([[6., 23.], [-1, 7], [8, 9]])
x,y

(array([[1., 2., 3.],
        [4., 5., 6.]]), array([[ 6., 23.],
        [-1.,  7.],
        [ 8.,  9.]]))

In [4]:
x.dot(y)

array([[ 28.,  64.],
       [ 67., 181.]])

In [6]:
# x.dot(y) is equivalent to np.dot(x, y)
np.dot(x, y)

array([[ 28.,  64.],
       [ 67., 181.]])

In [7]:
np.dot(x, np.ones(3))

array([ 6., 15.])

In [8]:
x @ np.ones(3)

array([ 6., 15.])

`numpy.linalg` has a standard set of matrix decompositions and things like inverse
and determinant. These are implemented under the hood via the same industrystandard linear algebra libraries used in other languages like MATLAB and R, such as
BLAS, LAPACK, or possibly (depending on your NumPy build) the proprietary Intel
MKL (Math Kernel Library):