# NumPy Basis: Arrays and Vectorized Computation

NumPy, short for Numerical Python, is one of the most import foundational packagers for numerical computing in Python. Many computational packages providing scientific functionality use NumPy's array objects as one of the standard interface for data exchange. 

Here are some of the things you'll find in NumPy:

- ndarray, an effcient multidimensional array prociding fast raay-oriented arithmetic operations and flexible capabilites.
- Mathemcatical functions for fast operations on entire arrays of data without having to write loops.
- Tools for reading/writing array data to disk and working with memory-mapped files.
- Linear algebra, random number genrations, and fourier transform capabilities
- Fast array-based operations for data munging and cleaning, subsetting and filtering, transformation, and any other kind of computation
- Common array algorithms like sorting, unique, and set operations
- Efficient descriptive statistics and aggregating/summarizing data
- Data alignment and relational data manipulations for merging and joining heter‐
ogeneous datasets
- Expressing conditional logic as array expressions instead of loops with if-elifelse branches
-  Group-wise data manipulations (aggregation, transformation, and function
application)


One of the reasons NumPy is so important for numerical computations in Python is becuase it is designed for efficiency on large arrays of data. There are number of reasons for this. 

- NumPy internally stores data in contiguous block of memory, independent of other built-in Python objects. Numpy's library of algorithms written in C language can operate on this memory without any type checking or other overhead. NumPy arrays alos use much less memory than built-in Python sequences.
- NumPy operations perform complex computations on entire arrays without the need for Python for loops, which can be slow for large sequences. NumPy is faster than regular Python code becuase its C-based alogirthms avoid overhead present with regular interpreted Python code. 

In [2]:
### To give you an idea of performance difference, consider a NumPy array of one million integers, and equivalent Python list

import numpy as np # importing numpy library as np
my_arr = np.arange(1_000_000)

my_list = list(range(1_000_000)) # list

### now let's multiply each sequence by 2

%timeit my_arr2 = my_arr * 2


%timeit my_list2 = [x * 2 for x in my_list] 

               

1.35 ms ± 13 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
59.8 ms ± 867 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)


NumPy based algorithms are generally 10 to 100 times faster than their pure Python counterparts and use significally less memory.

## NumPy ndarray: A Multidimensional Array Object.

Key features of NumPy is its N-dimensional array object, or ndarray, which is fast, flexible container for large datasets in Python. Arrays enable you to perform mathematical operations on whole blocks of data using similar syntax to the equivalant operations between scalar elements. 

In [3]:
import numpy as np

data = np.array([
    [1.5, -0.1, 3],
    [0, -3, 6.5]
])

data

array([[ 1.5, -0.1,  3. ],
       [ 0. , -3. ,  6.5]])

In [4]:
### Matematical operations 

data * 10   # all elements will multply by 10.

array([[ 15.,  -1.,  30.],
       [  0., -30.,  65.]])

In [5]:
data + data  # values are added to each other.

array([[ 3. , -0.2,  6. ],
       [ 0. , -6. , 13. ]])

In [6]:
from numpy import *

mydata = data
print(mydata)

[[ 1.5 -0.1  3. ]
 [ 0.  -3.   6.5]]


Becuase, numpy module has large number of functions whose names conflict with built-in Python functions (like min and max). 

In [7]:
data.shape # shape is object in numpy used to check dimension of array. 

(2, 3)

In [8]:
data.dtype  # dtype, an object in numpy describing the data type of the array

dtype('float64')

## Creating ndarrays

The easiest way to create an array is to use the array function. This accepts any sequence-like object and produces a new NumPy array containing the passed data. 

It is good for conversion.

In [9]:
data1 = [6, 7.5, 8, 0, 1]

In [10]:
import numpy as np
arr1 = np.array(data1)

In [11]:
arr1

array([6. , 7.5, 8. , 0. , 1. ])

Nested sequences, like a list of equal length lists, will be converted into a multidimensional array:

In [81]:
data2 = [[1, 2, 3, 4],[5,6,7,8]]
arr2 = np.array(data2)
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

Since data2 was a list of lists. the NumPy arrays arr2 has two dimensions, with shape inferred from the data. we can confirm this by inspecting the ndim and shape attribures:

In [82]:
arr2.ndim

2

In [83]:
arr2.shape

(2, 4)

In [84]:
arr2.dtype

dtype('int32')

In [85]:
arr1.dtype

dtype('float64')

In addition to numpy.array there are a number of other functions for creating new arrays. As examples numpy.zeros and numpy.ones create arrays of 0s or 1s, respectively, with a given length or shape. numpy.empty creates an array without initializing its values to any particular value. 

To create a higher dimensional array with these methods, pass a tuple for the shape

In [86]:
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [87]:
np.zeros((3,6))

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [88]:
np.empty((2,3,2))

array([[[1.38775222e-311, 3.16202013e-322],
        [0.00000000e+000, 0.00000000e+000],
        [1.14587773e-312, 1.85527432e-051]],

       [[1.55974123e+161, 2.04131075e+184],
        [1.71769871e+185, 1.95207657e+160],
        [1.73762304e-047, 5.06172498e-038]]])

numpy.arange is an array values version of built in Python range function

In [89]:
np.arange(15)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

NumPy is focused on numerical computing. the data type if not specified , will in many cases be float64

- array          : Convert input data to an ndarray either by inferring a data type or explicity specifing a data type; copies the input data by default
- asarray        : Convert input to ndarray, but do not copy the input is already an ndarray
- arange         : Like the built in range but return an ndarray instead of a list.
- ones           : Produces an array all of 1s
- ones_like      : produces an array of the same shape and data type.
- zeros          : like onesand ones_like but producing arrays of 0s instead.
- empty          : Create new array by allocating new memory, but do not populate with any valeus like ones and zeros
- empty_like     : zeros
- full           ; produces an array of the given shape and data type with all values set to indicated "fill value"
- eye, identity  : create a square N x N identity matrix (1s on the diagonal and 0s else where)

## Data Types of ndarrays

The *data type* or dtype is a special object containing the information the ndarray needs to interpret a chunk of memory as a particular type of data

In [90]:
arr1 = np.array([1, 2, 3], dtype=np.float64)
arr2 = np.array([1, 2, 3], dtype=np.int32)

arr1.dtype


dtype('float64')

In [91]:
arr2.dtype

dtype('int32')

To know more about NumPy data types, please visit 

https://numpy.org/devdocs/reference/arrays.dtypes.html#arrays-dtypes

you can convert or cast an array from one data type to another using ndarray's astype method

In [92]:
arr = np.array([1, 2, 3, 4, 5])
arr.dtype    # if you are getting int32 its because of you 32-bit system or you might installed Python 32-bit version. 

dtype('int32')

In [93]:
### to check your system version 
import platform
print(platform.architecture())

('64bit', 'WindowsPE')


In [94]:
float_arr = arr.astype(np.float64)
float_arr


array([1., 2., 3., 4., 5.])

In [95]:
float_arr.dtype

dtype('float64')

In this example, integers were cast to floating point. If i cast floating-point numbers to be of integers data type, the decimal part will be truncated.

In [96]:
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
arr

array([ 3.7, -1.2, -2.6,  0.5, 12.9, 10.1])

In [97]:
arr.astype(np.int32)

array([ 3, -1, -2,  0, 12, 10])

If you have an array of strings representing numbers, you can use astype to convert them to numeric form

In [98]:
numeric_strings = np.array(["1.25", "-9.6", "42"], dtype=np.string_)
numeric_strings.dtype

dtype('S4')

In [99]:
numeric_strings.astype(float)

array([ 1.25, -9.6 , 42.  ])

There are shorthand type code strings you can use to refer to a dtype

In [100]:
zeros_uint32 = np.zeros(8, dtype="u4")
zeros_uint32

array([0, 0, 0, 0, 0, 0, 0, 0], dtype=uint32)

> Calling as type always creates a new array ( a cpy of the data), even itf the new data type is the same as the old data type.

## Arithmetic with NumPy Arrays

Arrays are important becuase they enable you to express batch operations on data without writing any for loops. NumPy users call this ***vectorization***. Any arithmetic operations between equal-size arrays apply with operations element wise

In [101]:
import numpy as np
arr = np.array([[1., 2., 3.,], [4., 5., 6.]])
arr

array([[1., 2., 3.],
       [4., 5., 6.]])

In [102]:
arr * arr

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

In [103]:
arr - arr

array([[0., 0., 0.],
       [0., 0., 0.]])

Arithmetic operations with scalar propagate the scalar argument to each element in the array:

In [104]:
1 / arr

array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])

In [105]:
arr ** 2

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

Comparision operators between arrays of the same size yield boolean arrays:

In [106]:
arr2 = np.array([
    [0., 4., 1.],
    [7., 2., 12.]
])
arr2 

array([[ 0.,  4.,  1.],
       [ 7.,  2., 12.]])

In [107]:
arr2 > arr

array([[False,  True, False],
       [ True, False,  True]])

Evaluating operation between differently sized arrays is called ***broadcasting***.

## Basic indexing and Slicing

In [108]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [109]:
arr[5]

5

In [110]:
arr[5:8]

array([5, 6, 7])

In [111]:
arr[5:8] = 12
arr

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

In [112]:
## Slicing

arr_slice = arr[5:8]
arr_slice

array([12, 12, 12])

In [113]:
arr_slice[1] = 12345
arr

array([    0,     1,     2,     3,     4,    12, 12345,    12,     8,
           9])

The "bare" slice [:] will assign all values in array:

In [114]:
arr_slice[:] = 64
arr

array([ 0,  1,  2,  3,  4, 64, 64, 64,  8,  9])

If you are new to NumPy, you might be surprised by this, especially if you have used other array programming languages that copy data more eagerly. As NumPy has been designed to be able to work with very large arrays, you could imagine performance and memory problems if NumPy insisted on always copying data.


> If you want a copy of a slice of an ndarray instead of a view, you will need to explicitly copy the array—for example, arr[5:8].copy(). As you will see, pandas works this way, too.

With higher dimensional arrays, you have many more options. In a two-dimensional array, the elements at each index are no longer scalars but rather one-dimensional arrays:

In [115]:
arr2d = np.array([
    [1, 2, 3],[4, 5, 6],[7, 8, 9]
])
arr2d[2]

array([7, 8, 9])

Thus, individual elements can be accessed recursively. But that is a bit too much work, so you can pass a comma-separated list of indices to select individual elements.So these are equivalent:


In [116]:
arr2d[0][2]

3

In [117]:
arr2d[0, 2]

3

In multidimensional arrays, if you omit later indices, the returned object will be a lower dimensional ndarray consisting of all the data along the higher dimensions. So in the 2 × 2 × 3 array arr3d:

In [118]:
arr3d = np.array([
    [
        [1, 2, 3], [4, 5, 6]
    ],
    [
        [7, 8, 9], [10, 11, 12]
    ]
])  ### simpy you can alsow write. [[[1, 2, 3], [4, 5, 6]],[[7, 8,9,], [10, 11, 12]]]

arr3d 

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [119]:
arr3d[0]

array([[1, 2, 3],
       [4, 5, 6]])

Both scalar values and arrays can be assigned to arr3d[0]:

In [120]:
old_values = arr3d[0].copy()
arr3d[0] = 42
arr3d

array([[[42, 42, 42],
        [42, 42, 42]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [121]:
arr3d[0] = old_values
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

Similary, arr3d[1, 0] gives you all the values whose indices start with (1, 0), forming a one-dimensional array:

In [122]:
arr3d[1, 0]

array([7, 8, 9])

This expression is same as though we have indexed in two steps:

In [123]:
x = arr3d[1]
x

array([[ 7,  8,  9],
       [10, 11, 12]])

In [124]:
x[0]

array([7, 8, 9])

> This multidimensional indexing syntax for NumPy arrays will not work with regular Python objects, such as lists of lists.

## Indexing with Slices

Like one-dimensional objects such as Python lists, ndarrays can be sliced with familiar syntax

In [125]:
arr

array([ 0,  1,  2,  3,  4, 64, 64, 64,  8,  9])

In [126]:
arr[1:6]

array([ 1,  2,  3,  4, 64])

Consider the two-dimensional array from before, arr2d. Slicing this array is a bit different.

In [127]:
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [128]:
arr2d[:2]

array([[1, 2, 3],
       [4, 5, 6]])

As you see it has sliced along axis 0, the first axis. A slice, therefore, selects a range of elements along an axis. It can be helpful to read the expressions arr2d[:2] as "select the first two rows of arr2d."

You can pass multiple slices just like you can pass mulitple indexes:

In [129]:
arr2d[:2, 1:]

array([[2, 3],
       [5, 6]])

When slicing like this, you always obtain array views of the same number of dimensions. 
By mixing integer indexes and slices, you get lower dimensional slices. 

For example, I can select the select row but only the first two columns, like so:

In [130]:
lower_dim_slice = arr2d[1, :2]

In [131]:
lower_dim_slice.shape

(2,)

In [132]:
lower_dim_slice

array([4, 5])

Similaryly, I can select the third column but only the first two rows, like so:

In [133]:
arr2d[:2, 2]

array([3, 6])

> a colon by itself means to take entire axis.

In [134]:
arr2d[:, :1]

array([[1],
       [4],
       [7]])

of course, assiging to a slice expression assigns to the whole selection

In [135]:
arr2d[:2, 1:] = 0
arr2d

array([[1, 0, 0],
       [4, 0, 0],
       [7, 8, 9]])

## Fancy Indexing

*Fancy indexing* is a term adopted by NumPy to describe indexing using integer arrays. Suppose we had an 8 x 4 array:

In [143]:
arr = np.zeros((8, 4))

In [145]:
for i in range(8):
    arr[i] = i

arr

array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])

To select a subset of the rows in a particular order, you can simply pass a list or ndarray of integers specifying the desired order: 

In [146]:
arr[[4, 3, 0, 6]]

array([[4., 4., 4., 4.],
       [3., 3., 3., 3.],
       [0., 0., 0., 0.],
       [6., 6., 6., 6.]])

Hopefully this code did what you expected! Using negative indices selects rows from the end:

In [147]:
arr[[-3, -5, -7]]

array([[5., 5., 5., 5.],
       [3., 3., 3., 3.],
       [1., 1., 1., 1.]])

Passing multiple index arrays does something slightly different; it selects a onedimensional array of elements corresponding to each tuple of indices:

In [148]:
arr = np.arange(32).reshape((8,4))
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [149]:
arr[[1, 5, 7, 2], [0, 3, 1, 2]]

array([ 4, 23, 29, 10])

## Transposing Arrays and Swapping Axes

Transposing is a special form of reshaping that similarly returns a view on the underlying data without copying anything. Arrays have the transpose method and the special T attribute:


In [150]:
arr = np.arange(15).reshape((3, 5))

In [151]:
arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [152]:
arr.T

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

When doing matrix computations, you may do this very often—for example, when computing the inner matrix product using numpy.dot:


In [153]:
arr = np.array([[0, 1, 0], [1, 2, -2], [6, 3, 2], [-1, 0, -1], [1, 0, 1]])


In [154]:
arr

array([[ 0,  1,  0],
       [ 1,  2, -2],
       [ 6,  3,  2],
       [-1,  0, -1],
       [ 1,  0,  1]])

In [155]:
np.dot(arr.T, arr)

array([[39, 20, 12],
       [20, 14,  2],
       [12,  2, 10]])

In [156]:
arr.T  @ arr #using @ infix operator is another way to do matrix multiplication.

array([[39, 20, 12],
       [20, 14,  2],
       [12,  2, 10]])

Simple transposing with .T is a special case of swapping axes. ndarray has the method swapaxes, which takes a pair of axis numbers and switches the indicated axes to rearrange the data:


In [157]:
arr

array([[ 0,  1,  0],
       [ 1,  2, -2],
       [ 6,  3,  2],
       [-1,  0, -1],
       [ 1,  0,  1]])

In [158]:
arr.swapaxes(0, 1)

array([[ 0,  1,  6, -1,  1],
       [ 1,  2,  3,  0,  0],
       [ 0, -2,  2, -1,  1]])