## Numpy N-dimensional Arrays

#### Motivation:

* Mathematical functions for fast operations on entire arrays of data without having to write loops.
* linear algebra
* random number generation
* data munging and cleaning
* subsetting and filtering
* transformation, and any other kinds of computations
* sorting
* unique
* set operations
* Expressing conditional logic as array expressions instead of loops with if-elif-else branches
* Group-wise data manipulations (aggregation, transformation, function application)

##### NumPy-based algorithms are generally 10 to 100 times faster (or more) than their pure Python counterparts and use significantly less memory.

In [1]:
import numpy as np

In [7]:
# create array of 1M numbers using np.arange function

arr = np.arange(1000000)

# create a list of 1M numbers 

my_list = list(range(1000000))

In [8]:
# multiply each sequance by 2 and record the processing time for each operation

%time 
for _ in range(10):
    arr2 = arr*2

CPU times: user 3 µs, sys: 1 µs, total: 4 µs
Wall time: 4.77 µs


In [9]:
%time
for _ in range(10):
    my_list2 = [x * 2 for x in my_list]

CPU times: user 5 µs, sys: 1 µs, total: 6 µs
Wall time: 10 µs


##### Random numbers are useful when we have to visualize the distribution of data, we can create random numbers using np.random.randn function of random class

In [52]:
data = np.random.randn(2, 4)
data

array([[-1.93944157,  1.52909749,  0.03962978,  0.33127774],
       [ 0.96961204, -0.78056153,  1.32819683,  0.95496203]])

In [53]:
round(data.mean())

0

In [54]:
round(data.std())

1

## Mathematical operations on numpy arrays

In [55]:
# multiplication

data*2

array([[-3.87888313,  3.05819499,  0.07925955,  0.66255548],
       [ 1.93922409, -1.56112305,  2.65639367,  1.90992407]])

In [56]:
# addition

data+2

array([[0.06055843, 3.52909749, 2.03962978, 2.33127774],
       [2.96961204, 1.21943847, 3.32819683, 2.95496203]])

In [57]:
self_add = data + data
self_add

array([[-3.87888313,  3.05819499,  0.07925955,  0.66255548],
       [ 1.93922409, -1.56112305,  2.65639367,  1.90992407]])

In [58]:
data - data

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.]])

##### Arithmetic operations with scalars propagate the scalar argument to each element in the array:

In [59]:
1/data 

array([[-0.51561234,  0.65398054, 25.23355099,  3.01861513],
       [ 1.03134032, -1.28112899,  0.75290046,  1.04716205]])

In [61]:
# square of each element of array

data ** 2

array([[3.76143359e+00, 2.33813914e+00, 1.57051923e-03, 1.09744941e-01],
       [9.40147518e-01, 6.09276297e-01, 1.76410683e+00, 9.11952487e-01]])

##### comaprison of arrays

In [62]:
data2 = np.random.randn(2, 4)

data>data2

array([[False,  True,  True, False],
       [False,  True,  True,  True]])

## Dimension Checking

In [29]:
# check the shape or order of data

data.shape

(2, 4)

##### order is 2x4 (2 rows and 4 columns)

In [30]:
# check the dimension of data

data.ndim

2

##### data is 2 dimensional array

In [31]:
data.dtype

dtype('float64')

##### data type is float

## Creating array from a list

In [32]:
nums = [6, 7.5, 8, 0, 1]

arr = np.array(nums)
arr

array([6. , 7.5, 8. , 0. , 1. ])

In [33]:
arr.dtype

dtype('float64')

##### By default data type of array is float if a list of numbers (both decimal and integers) is passed, to change the datatype of array at the time of creation we can pass dtype attribute in np.array function

In [34]:
arr = np.array(nums, dtype=np.int32)
arr

array([6, 7, 8, 0, 1], dtype=int32)

## Some builtin methods to create arrays

In [36]:
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [37]:
np.ones(10)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [40]:
# to create multidimensional array of zeros pass a tuple of order of dimension as argument

np.zeros((2, 3))

array([[0., 0., 0.],
       [0., 0., 0.]])

In [43]:
np.empty((2, 3))

array([[0., 0., 0.],
       [0., 0., 0.]])

In [47]:
np.empty((2, 3, 2))

array([[[4.65235132e-310, 0.00000000e+000],
        [0.00000000e+000, 0.00000000e+000],
        [0.00000000e+000, 0.00000000e+000]],

       [[0.00000000e+000, 0.00000000e+000],
        [0.00000000e+000, 0.00000000e+000],
        [0.00000000e+000, 0.00000000e+000]]])

#### It’s not safe to assume that np.empty will return an array of all zeros. In some cases, it may return uninitialized “garbage” values

In [48]:
np.arange(15)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [49]:
help(np.arange)

Help on built-in function arange in module numpy:

arange(...)
    arange([start,] stop[, step,], dtype=None)
    
    Return evenly spaced values within a given interval.
    
    Values are generated within the half-open interval ``[start, stop)``
    (in other words, the interval including `start` but excluding `stop`).
    For integer arguments the function is equivalent to the Python built-in
    `range` function, but returns an ndarray rather than a list.
    
    When using a non-integer step, such as 0.1, the results will often not
    be consistent.  It is better to use `numpy.linspace` for these cases.
    
    Parameters
    ----------
    start : number, optional
        Start of interval.  The interval includes this value.  The default
        start value is 0.
    stop : number
        End of interval.  The interval does not include this value, except
        in some cases where `step` is not an integer and floating point
        round-off affects the length of `out`.
   

## Book Reference: Python for Data Analysis (2nd Ed)