# **Numpy**
* NumPy, short for Numerical Python, is one of the most important foundational packages for numerical computing in Python.
* Most computational packages providing scientific functionality use NumPy's array objects as the entity for data exchange.
* Some of the unique features find in NumPy:
  * ndarray, an efficient multidimensional array providing **fast array-oriented arithmetic operations** and flexible broadcasting capabilities.
  * Mathematical functions for **fast operations on entire arrays of data without having to write loops**.
  * Tools for reading/writing array **data to disk** and **working with memory-mapped** files.
  * Linear algebra, random number generation, and Fourier transform capabilities.
  * A **C API for connecting NumPy with libraries** written in C, C++, or FORTRAN.

**For most data analysis applications, the main areas of functionality focused on are:**
  * Fast vectorized array operations for data munging and cleaning, subsetting and filtering, transformation, and any other kinds of computations.
  * Common array algorithms like sorting, unique, and set operations
  * Efficient descriptive statistics and aggregating/summarizing data
  * Data alignment and relational data manipulations for merging and joining
together heterogeneous datasets
  * Expressing conditional logic as array expressions instead of loops with if-elifelse branches
  * Group-wise data manipulations (aggregation, transformation, function application)

**One of the reasons NumPy is so important for numerical computations in Python is because it is designed for efficiency on large arrays of data. There are a number of reasons for this:**
  * NumPy internally stores data in a contiguous block of memory, independent of other built-in Python objects. NumPy's library of algorithms written in the C language can operate on this memory without any type checking or other overhead.
  * NumPy arrays also use much less memory than built-in Python sequences.
  * NumPy operations perform complex computations on entire arrays without the need for Python for loops.

**To measure performance difference, consider a NumPy array of one million integers, and the equivalent Python list:**


# NumPy Basics: Arrays and Vectorized Computation

In [3]:
import numpy as np
my_arr = np.arange(10)
my_list = list(range(10))
my_arr
my_list

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [2]:
%time for _ in range(10): my_arr2 = my_arr * 2
%time for _ in range(10): my_list2 = [x * 2 for x in my_list]

Wall time: 21.1 ms
Wall time: 2.93 s


## The NumPy ndarray: A Multidimensional Array Object

* Ndarray is the n-dimensional array object defined in the numpy which stores the collection of the similar type of elements. In other words, we can define a ndarray as the collection of the data type (dtype) objects.
* The ndarray object can be accessed by using the 0 based indexing.
* Each element of the Array object contains the same size in the memory.


* One of the key features of NumPy is its N-dimensional array object, or ndarray, which is a fast, flexible container for large datasets in Python.
* Arrays enable you to perform mathematical operations on whole blocks of data using similar syntax to the equivalent operations between scalar elements.

In [5]:
data = np.random.randn(2, 3)
data

array([[-0.20436361, -0.10685631, -2.39255278],
       [ 0.52395809,  0.12777437, -0.95304297]])

In [6]:
data * 10

array([[ -2.04363614,  -1.06856308, -23.92552783],
       [  5.2395809 ,   1.27774366,  -9.53042974]])

* In the above example all of the elements have been multiplied by 10.

In [7]:
data + data

array([[-0.40872723, -0.21371262, -4.78510557],
       [ 1.04791618,  0.25554873, -1.90608595]])

* In the above example the corresponding values in each “cell” in the array have been added to each other.

* An ndarray is a generic multidimensional container for homogeneous data; that is, all of the elements must be the same type.
* Every array has a shape, a tuple indicating the size of each dimension, and a dtype, an object describing the data type of the array:

In [6]:
data.shape

(2, 3)

In [9]:
data.dtype

dtype('float64')

### Creating ndarrays

* The easiest way to create an array is to use the array function. This accepts any sequence-like object (including other arrays) and produces a new NumPy array containing the passed data.

* The syntax is given below



  > **numpy.array(object, dtype = None, copy = True, order = None, subok = False, ndmin = 0)**

* The parameters are described in the following table

<p align="center">
    <img src="http://raghudathesh.weebly.com/uploads/4/8/9/6/48968251/3_orig.png">
</p>


* For example, a list is a good candidate for conversion:

In [8]:
data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)
arr1

array([6. , 7.5, 8. , 0. , 1. ])

* Nested sequences, like a list of equal-length lists, will be converted into a multidimensional array:

In [9]:
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

* Since data2 was a list of lists, the NumPy array arr2 has two dimensions with shape inferred from the data.
* We can confirm this by inspecting the ndim and shape attributes:

In [10]:
arr2.ndim

2

In [11]:
arr2.shape

(2, 4)

* Unless explicitly specified (more on this later), np.array tries to infer a good data type (like int, foat etc) for the array that it creates.
* The data type is stored in a special dtype metadata object; for example, in the previous two examples we have:

In [None]:
arr1.dtype

dtype('float64')

In [None]:
arr2.dtype

dtype('int64')

* In addition to np.array, there are a number of other functions for creating new arrays.

# Ex:
* zeros and ones create arrays of 0s or 1s, respectively, with a given length or shape.
* Empty creates an array without initializing its values to any particular value.
* To create a higher dimensional array with these methods, pass a tuple for the shape:

In [None]:
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [None]:
np.zeros((3, 6))

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [None]:
np.empty((2, 3, 2))

array([[[5.06245341e-310, 0.00000000e+000],
        [0.00000000e+000, 0.00000000e+000],
        [0.00000000e+000, 0.00000000e+000]],

       [[0.00000000e+000, 0.00000000e+000],
        [0.00000000e+000, 0.00000000e+000],
        [0.00000000e+000, 0.00000000e+000]]])

# Note:
* It’s not safe to assume that np.empty will return an array of all zeros.
* In some cases, it may return uninitialized “garbage” values.

* **arange** is an array-valued version of the built-in Python range function:

In [None]:
np.arange(15)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

* Table below shows a short list of standard array creation functions.
* Since NumPy is focused on numerical computing, the data type, if not specified, will in many cases be float64 (floating point).

<p align="center">
    <img src="http://raghudathesh.weebly.com/uploads/4/8/9/6/48968251/1_orig.png">
</p>

### Data Types for ndarrays

* The data type or dtype is a special object containing the information (or metadata, data about data) the ndarray needs to interpret a chunk of memory as a particular type of data:

In [None]:
arr1 = np.array([1, 2, 3], dtype=np.float64)
arr1.dtype

dtype('float64')

In [None]:
arr2 = np.array([1, 2, 3], dtype=np.int32)
arr2.dtype

dtype('int32')

* **dtypes** are a source of NumPy’s flexibility for interacting with data coming from other systems.
* In most cases they provide a mapping directly onto an underlying disk or memory representation, which makes it easy to read and write binary streams of data to disk and also to connect to code written in a low-level language like C or Fortran.
* The numerical dtypes are named the same way: a type name, like float or int, followed by a number indicating the number of bits per element.
* A standard doubleprecision floating-point value (what’s used under the hood in Python’s float object) takes up 8 bytes or 64 bits. Thus, this type is known in NumPy as float64.
*Table below provides a full listing of NumPy’s supported data types.

<p align="center">
    <img src="http://raghudathesh.weebly.com/uploads/4/8/9/6/48968251/2_orig.png">
</p>

* You can explicitly convert or cast an array from one dtype to another using ndarray’s **astype method**:

In [None]:
arr = np.array([1, 2, 3, 4, 5])
print(arr.dtype)
float_arr = arr.astype(np.float64)
print(float_arr.dtype)

int64
float64


* In the above example, integers were cast to floating point.
* If I cast some floating-point numbers to be of integer dtype, the decimal part will be truncated:

In [None]:
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
print(arr)
print(arr.astype(np.int32))

[ 3.7 -1.2 -2.6  0.5 12.9 10.1]
[ 3 -1 -2  0 12 10]


* If you have an array of strings representing numbers, you can use **astype** to convert them to numeric form:

In [None]:
numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_)
numeric_strings.astype(float)

array([ 1.25, -9.6 , 42.  ])

# Note:
* It’s important to be cautious when using the numpy.string_ type, as string data in NumPy is fixed size and may truncate input
without warning.
If casting were to fail for some reason (like a string that cannot be converted to float64), a ValueError will be raised.

# Extra
* We can use another array’s dtype attribute:

In [None]:
int_array = np.arange(10)
calibers = np.array([.22, .270, .357, .380, .44, .50], dtype=np.float64)
int_array.astype(calibers.dtype)

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

# Extra
* There are shorthand type code strings you can also use to refer to a dtype:

In [None]:
empty_uint32 = np.empty(8, dtype='u4')
empty_uint32

array([         0, 1075314688,          0, 1075707904,          0,
       1075838976,          0, 1072693248], dtype=uint32)

# Arithmetic with NumPy Arrays

* Arrays are important because **they enable you to express batch operations on data** without writing any for loops.
* **NumPy users call this vectorization**. Any arithmetic operations between equal-size arrays applies the operation element-wise:

In [None]:
arr = np.array([[1., 2., 3.], [4., 5., 6.]])
arr

array([[1., 2., 3.],
       [4., 5., 6.]])

In [None]:
arr * arr

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

In [None]:
arr - arr

array([[0., 0., 0.],
       [0., 0., 0.]])

* Arithmetic operations with scalars propagate the scalar argument to each element in the array:

In [None]:
1 / arr

array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])

In [None]:
arr ** 0.5

array([[1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974]])

* Comparisons between arrays of the same size yield boolean arrays:

In [None]:
arr

array([[1., 2., 3.],
       [4., 5., 6.]])

In [None]:
arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])
arr2

array([[ 0.,  4.,  1.],
       [ 7.,  2., 12.]])

In [None]:
arr2 > arr

array([[False,  True, False],
       [ True, False,  True]])

* **Operations between differently sized arrays is called broadcasting and will be discussed later**.

# Basic Indexing and Slicing

* NumPy array indexing is a rich topic, as there are many ways you may want to select a subset of your data or individual elements.
* One-dimensional arrays are simple; on the surface they act similarly to Python lists:

In [None]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [None]:
arr[5]

5

In [None]:
arr[5:8]

array([5, 6, 7])

In [None]:
arr[5:8] = 12
arr

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

* As you can see, if you assign a scalar value to a slice, as in arr[5:8] = 12, the value is propagated (or broadcasted henceforth) to the entire selection.
* An important first distinction from Python’s built-in lists is that array slices are views on the original array.
* This means that the data is not copied, and any modifications to the view will be
reflected in the source array.

In [None]:
arr_slice = arr[5:8]
arr_slice

array([12, 12, 12])

* Now,  **change values in arr_slice, the mutations are reflected in the original
array arr**:

In [None]:
arr_slice[1] = 12345
arr

array([    0,     1,     2,     3,     4,    12, 12345,    12,     8,
           9])

* The **“bare”** slice [:] will assign to all values in an array:

In [None]:
arr_slice[:] = 64
arr

array([ 0,  1,  2,  3,  4, 64, 64, 64,  8,  9])

* If you are new to NumPy, you might be surprised by this, especially if you have used
other array programming languages that copy data more eagerly.
*  As NumPy has been designed to be able to work with very large arrays, you could imagine performance and memory problems if NumPy insisted on always copying data.

* With higher dimensional arrays, you have many more options.
* In a two-dimensional array, the elements at each index are no longer scalars but rather one-dimensional arrays:

In [11]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d[2]

array([7, 8, 9])

* The individual elements in array can be accessed recursively. For this pass a comma-separated list of indices to select individual elements.

In [12]:
arr2d[0, 2]

3

In [None]:
arr2d[1][2]

6

In [14]:
arr2d[1:]

array([[4, 5, 6],
       [7, 8, 9]])

* In multidimensional arrays, if you omit later indices, the returned object will be a
lower dimensional ndarray consisting of all the data along the higher dimensions.
* So
in the 2 × 2 × 3 array arr3d:

In [15]:
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

* arr3d[0] is a 2 × 3 array:

In [16]:
arr3d[0]

array([[1, 2, 3],
       [4, 5, 6]])

In [None]:
arr3d[1]

array([[ 7,  8,  9],
       [10, 11, 12]])

In [18]:
arr3d[0,1,2]

6

* Both scalar values and arrays can be assigned to arr3d[0]:

In [None]:
old_values = arr3d[0].copy()
arr3d[0] = 42
arr3d

array([[[42, 42, 42],
        [42, 42, 42]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [None]:
arr3d[0] = old_values
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

*  arr3d[1, 0] gives you all of the values whose indices start with (1, 0), forming a 1-dimensional array:

In [None]:
arr3d[1, 0]

array([7, 8, 9])

* This expression is the same as though we had indexed in two steps:

In [None]:
x = arr3d[1]
x

array([[ 7,  8,  9],
       [10, 11, 12]])

In [None]:
x[0]

array([7, 8, 9])

* Note that in all of these cases where subsections of the array have been selected, the returned arrays are views.

# Indexing with slices

* Like one-dimensional objects such as Python lists, ndarrays can be sliced with the
familiar syntax:

In [None]:
arr

array([ 0,  1,  2,  3,  4, 64, 64, 64,  8,  9])

In [None]:
arr[1:6]

array([ 1,  2,  3,  4, 64])

* Consider the two-dimensional array from before, arr2d. Slicing this array is a bit
different:

In [None]:
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [None]:
arr2d[:2]

array([[1, 2, 3],
       [4, 5, 6]])

* As you can see, it has sliced along axis 0, the first axis.
* A slice, therefore, selects a
range of elements along an axis. It can be helpful to read the expression arr2d[:2] as **select the first two rows of arr2d**.

In [None]:
arr2d[:2, 1:]

array([[2, 3],
       [5, 6]])

**Guess the output**

In [19]:
arr2d[:2, 0:]

array([[1, 2, 3],
       [4, 5, 6]])

* When slicing like this, always obtain array views of the same number of dimensions.
* By mixing integer indexes and slices, you get lower dimensional slices.

* For example, We can select the second row but only the first two columns like so:

In [None]:
arr2d[1, :2]

array([4, 5])

* Similarly, we can select the third column but only the first two rows like so:

In [None]:
arr2d[:2, 2]

array([3, 6])

* A colon by itself means to take the entire
axis, so you can slice only higher dimensional axes by doing:

In [None]:
arr2d[:, :1]

array([[1],
       [4],
       [7]])

* The Instruction can be read as select all the rows and first column elements.

* Assigning to a slice expression assigns to the whole selection:

In [None]:
arr2d[:2, 1:] = 0
arr2d

array([[1, 0, 0],
       [4, 0, 0],
       [7, 8, 9]])

### Boolean Indexing
* Let’s consider an example where we have some data in an array and an array of names
with duplicates.
* We going to use here the randn function in **numpy.random** to generate some random normally distributed data:

In [None]:
names = np.array(['Mohit', 'Varshini', 'Sunil', 'Mohit', 'Varshini', 'Sunil', 'Sunil'])
data = np.random.randn(7, 4)
print(names)
print(data)

['Mohit' 'Varshini' 'Sunil' 'Mohit' 'Varshini' 'Sunil' 'Sunil']
[[ 1.61573158 -0.21467493 -0.55691091 -1.35529545]
 [ 0.63141381 -0.5910019   0.34113241  0.72124421]
 [-1.46798961  0.21982993  0.83232813 -1.06998477]
 [ 0.62967722  0.17273545  0.97245692 -1.13220514]
 [ 1.58912009  1.21402402 -1.17556876 -0.8562653 ]
 [ 0.10232066  1.62910022 -0.87208083 -1.26009825]
 [ 1.46412524 -0.60206117 -1.24660508 -1.06261898]]


* Suppose each name corresponds to a row in the data array and we wanted to select
all the rows with corresponding name 'Mohit'.
* Like arithmetic operations, comparisons (such as ==) with arrays are also vectorized.
* Thus, comparing names with the string 'Mohit' yields a boolean array:

In [None]:
names == 'Mohit'

array([ True, False, False,  True, False, False, False])

* This boolean array can be passed when indexing the array:

In [None]:
data[names == 'Mohit']

array([[ 1.61573158, -0.21467493, -0.55691091, -1.35529545],
       [ 0.62967722,  0.17273545,  0.97245692, -1.13220514]])

* The boolean array must be of the same length as the array axis it’s indexing.
* You can even mix and match boolean arrays with slices or integers.

* Here we select from the rows where names == 'Mohit' and index the columns, too:

In [None]:
data[names == 'Mohit', 2:]

array([[-0.55691091, -1.35529545],
       [ 0.97245692, -1.13220514]])

In [None]:
data[names == 'Mohit', 3]

array([-1.35529545, -1.13220514])

* * To select everything but 'Mohit', you can either use **!=** or negate the condition using **~**:

In [None]:
names != 'Mohit'

array([False,  True,  True, False,  True,  True,  True])

In [None]:
data[~(names == 'Mohit')]

array([[ 0.63141381, -0.5910019 ,  0.34113241,  0.72124421],
       [-1.46798961,  0.21982993,  0.83232813, -1.06998477],
       [ 1.58912009,  1.21402402, -1.17556876, -0.8562653 ],
       [ 0.10232066,  1.62910022, -0.87208083, -1.26009825],
       [ 1.46412524, -0.60206117, -1.24660508, -1.06261898]])

* The ~ operator can be useful when you want to invert a general condition:

In [None]:
cond = names == 'Mohit'
data[~cond]

array([[ 0.63141381, -0.5910019 ,  0.34113241,  0.72124421],
       [-1.46798961,  0.21982993,  0.83232813, -1.06998477],
       [ 1.58912009,  1.21402402, -1.17556876, -0.8562653 ],
       [ 0.10232066,  1.62910022, -0.87208083, -1.26009825],
       [ 1.46412524, -0.60206117, -1.24660508, -1.06261898]])

* Selecting two of the three names to combine multiple boolean conditions, use
boolean arithmetic operators like & (and) and | (or):

In [None]:
mask = (names == 'Mohit') | (names == 'Sunil')
mask

array([ True, False,  True,  True, False,  True,  True])

In [None]:
data[mask]

array([[ 1.61573158, -0.21467493, -0.55691091, -1.35529545],
       [-1.46798961,  0.21982993,  0.83232813, -1.06998477],
       [ 0.62967722,  0.17273545,  0.97245692, -1.13220514],
       [ 0.10232066,  1.62910022, -0.87208083, -1.26009825],
       [ 1.46412524, -0.60206117, -1.24660508, -1.06261898]])

* Selecting data from an array by boolean indexing always creates a copy of the data,
even if the returned array is unchanged.

* Setting values with boolean arrays works in a common-sense way.
* To set all of the negative values in data to 0 we need only do:

In [None]:
data[data < 0] = 0
data

array([[1.61573158, 0.        , 0.        , 0.        ],
       [0.63141381, 0.        , 0.34113241, 0.72124421],
       [0.        , 0.21982993, 0.83232813, 0.        ],
       [0.62967722, 0.17273545, 0.97245692, 0.        ],
       [1.58912009, 1.21402402, 0.        , 0.        ],
       [0.10232066, 1.62910022, 0.        , 0.        ],
       [1.46412524, 0.        , 0.        , 0.        ]])

* Setting whole rows or columns using a one-dimensional boolean array is also easy:

In [None]:
data[names != 'Varshini'] = 7
data

array([[7.        , 7.        , 7.        , 7.        ],
       [0.63141381, 0.        , 0.34113241, 0.72124421],
       [7.        , 7.        , 7.        , 7.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [1.58912009, 1.21402402, 0.        , 0.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [7.        , 7.        , 7.        , 7.        ]])

### Fancy Indexing

* Fancy indexing is a term adopted by NumPy to describe indexing using integer arrays.
* Suppose we had an 8 × 4 array:

In [29]:
arr = np.empty((8, 4))

for i in range(8):
    arr[i]=i
        
arr

array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])

* To select out a subset of the rows in a particular order, you can simply pass a list or
ndarray of integers specifying the desired order:

In [None]:
arr[[4, 3, 0, 6]]

array([[4., 4., 4., 4.],
       [3., 3., 3., 3.],
       [0., 0., 0., 0.],
       [6., 6., 6., 6.]])

* Using negative indices selects rows from
the end:

In [None]:
arr[[-3, -5, -7]]

array([[5., 5., 5., 5.],
       [3., 3., 3., 3.],
       [1., 1., 1., 1.]])

* Passing multiple index arrays does something slightly different; it selects a onedimensional array of elements corresponding to each tuple of indices:

In [None]:
arr = np.arange(32).reshape((8, 4))
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [None]:
arr[[1, 5, 7, 2], [0, 3, 1, 2]]

array([ 4, 23, 29, 10])

* Here the elements (1, 0), (5, 3), (7, 1), and (2, 2) were selected. Regardless of
how many dimensions the array has (here, only 2), the result of fancy indexing is
always one-dimensional.

# Note:
* Keep in mind that fancy indexing, unlike slicing, always copies the data into a new
array.

### Transposing Arrays and Swapping Axes
* Transposing is a special form of reshaping that similarly returns a view on the underlying data without copying anything.
* Arrays have the transpose method and also the
special T attribute:

In [None]:
arr = np.arange(15).reshape((3, 5))
arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [None]:
arr.T

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

* When doing matrix computations, you may do this very often for example, when computing the inner matrix product using np.dot:

In [None]:
arr = np.random.randn(6, 3)
arr

array([[-0.00718572, -0.20022277, -1.26057403],
       [-0.9569538 ,  0.26044122, -1.3634736 ],
       [ 0.60256131, -1.44772897,  0.83503816],
       [-0.96793901, -0.7622255 ,  1.18116813],
       [ 0.40691467, -1.96490864,  0.0403457 ],
       [-0.96404316,  0.03334688, -0.0567573 ]])

In [None]:
np.dot(arr.T, arr)

array([[ 3.31075703, -1.2140471 ,  0.7448361 ],
       [-1.2140471 ,  6.64680362, -2.2931028 ],
       [ 0.7448361 , -2.2931028 ,  5.54540319]])

* For higher dimensional arrays, transpose will accept a tuple of axis numbers to permute the axes (for extra mind bending):

In [31]:
arr = np.arange(16).reshape((2, 2, 4))
arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

* Above is 3D array with two "layers", each layer containing two rows and four columns.
* The data within the array are sequential integers.
---
* Access element at layer 0, row 1, column 2 (value: 6)


In [None]:
element = arr[0, 1, 2]
element

6

* Access entire second row of layer 1 (values: [12, 13, 14, 15])


In [None]:
second_row_layer_1 = arr[1, 1, :]
second_row_layer_1

array([12, 13, 14, 15])

* Access entire first layer (both rows and all columns)


In [None]:
first_layer = arr[0, :, :]
first_layer

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

# **arr.transpose((layers/num_sub_array, X_axis(should be same for both), Y_axis))**
* The transpose() function in NumPy permutes the dimensions of an array based on the order specified.
* For the given arr example
  * This array has dimensions (2, 2, 4), meaning:
    * 2 layers, each layer containing:
      * 2 rows, and each row containing:
        * 4 columns


Case1:Original:2,2,4 After transpose2,2,4  

In [None]:
arr.transpose((0, 1, 2))

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

Case2: Changing layer/sub_array, rows/X_axis keeping Column/Yaxis as it is.

  Here in frist sub array rows of first orignal sub array will be there as it is.

  Original:2,2,4 After transpose2,2,4 (2lay,2row,4col)

In [None]:
arr.transpose((1, 0, 2))

array([[[ 0,  1,  2,  3],
        [ 8,  9, 10, 11]],

       [[ 4,  5,  6,  7],
        [12, 13, 14, 15]]])

Case3: Changing rows/X_axis and Column/Y_axis ad keeping  Layer as it is.

Original:2,2,4 After transpose2,4,2 (2lay,4row,2col)

In [None]:
arr.transpose((0, 2, 1))

array([[[ 0,  4],
        [ 1,  5],
        [ 2,  6],
        [ 3,  7]],

       [[ 8, 12],
        [ 9, 13],
        [10, 14],
        [11, 15]]])

In [33]:
arr.transpose(2,0,1)

array([[[ 0,  4],
        [ 8, 12]],

       [[ 1,  5],
        [ 9, 13]],

       [[ 2,  6],
        [10, 14]],

       [[ 3,  7],
        [11, 15]]])

In [34]:
arr.transpose(1,2,0)

array([[[ 0,  8],
        [ 1,  9],
        [ 2, 10],
        [ 3, 11]],

       [[ 4, 12],
        [ 5, 13],
        [ 6, 14],
        [ 7, 15]]])

* Here, the axes have been reordered with the second axis first, the first axis second,
and the last axis unchanged.

# Assignment: Try for 2X3X4
* arr = np.arange(24).reshape((2, 3, 4))
* Case 1: arr.transpose((0, 1, 2))
* Case1:Original:2,3,4 After transpose ________

---
* arr.transpose((1, 0, 2))
* Case 2: Case1:Original:2,3,4 After transpose ________
---
* arr.transpose((0, 2, 1))
* Case 2: Case1:Original:2,3,4 After transpose ________


# Assignment Solution

In [None]:
import numpy as np

In [None]:
arr = np.arange(24).reshape((2, 3, 4))
arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

In [None]:
arr.transpose((0, 1, 2))

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

In [None]:
arr.transpose((1, 0, 2))

array([[[ 0,  1,  2,  3],
        [12, 13, 14, 15]],

       [[ 4,  5,  6,  7],
        [16, 17, 18, 19]],

       [[ 8,  9, 10, 11],
        [20, 21, 22, 23]]])

In [None]:
arr.transpose((0, 2, 1))

array([[[ 0,  4,  8],
        [ 1,  5,  9],
        [ 2,  6, 10],
        [ 3,  7, 11]],

       [[12, 16, 20],
        [13, 17, 21],
        [14, 18, 22],
        [15, 19, 23]]])

* Simple transposing with **.T** is a special case of swapping axes. ndarray has the method
swapaxes, which takes a pair of axis numbers and switches the indicated axes to rear‐
range the data:

In [None]:
arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [None]:
arr.swapaxes(1, 2)

array([[[ 0,  4],
        [ 1,  5],
        [ 2,  6],
        [ 3,  7]],

       [[ 8, 12],
        [ 9, 13],
        [10, 14],
        [11, 15]]])

* swapaxes similarly returns a view on the data without making a copy.

# Universal Functions: Fast Element-Wise Array Functions

* A universal function, or **ufunc**, is a function that performs element-wise operations
on data in ndarrays.
* You can think of them as fast vectorized wrappers for simple functions that take one or more scalar values and produce one or more scalar results.
* Many ufuncs are simple element-wise transformations, like sqrt or exp:

In [None]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [None]:
np.sqrt(arr)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

In [None]:
np.exp(arr)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

* These are referred to as unary ufuncs. Others, such as add or maximum, take two arrays
(thus, binary ufuncs) and return a single array as the result:

In [None]:
x = np.random.randn(8)
y = np.random.randn(8)
x

array([-0.2869084 , -0.88389244,  0.34473302,  0.61051233, -0.23785825,
        1.04468705,  0.57701908, -1.9112522 ])

In [None]:
y

array([ 1.96263332,  0.35200703, -0.28472385, -1.05566558,  0.03175724,
       -0.17300332, -0.83083281, -0.53321045])

In [None]:
np.maximum(x, y)

array([ 1.96263332,  0.35200703,  0.34473302,  0.61051233,  0.03175724,
        1.04468705,  0.57701908, -0.53321045])

* Here, **numpy.maximum** computed the element-wise maximum of the elements in x and
y.

# Unary ufuncs

<p align="center">
    <img src="http://raghudathesh.weebly.com/uploads/4/8/9/6/48968251/4_orig.png">
</p>

# Binary universal functions

<p align="center">
    <img src="http://raghudathesh.weebly.com/uploads/4/8/9/6/48968251/5_orig.png">
</p>

# Array-Oriented Programming with Arrays

* Using NumPy arrays enables you to express many kinds of data processing tasks as concise array expressions that might otherwise require writing loops.
* This practice of replacing explicit loops with array expressions is commonly referred to as vectorization.
* In general, vectorized array operations will often be one or two (or more) orders of magnitude faster than their pure Python equivalents, with the biggest impact in any kind of numerical computations.

## Example
* Suppose we wished to evaluate the function sqrt(x^2 + y^2) across a regular grid of values.
* The np.meshgrid function takes two 1D arrays and produces two 2D matrices corresponding to all pairs of (x, y) in the two arrays:
* np.meshgrid is a function provided by the NumPy library in Python, which is commonly used for creating coordinate grids.
It's particularly useful when you want to create a grid of coordinates for evaluating functions or plotting 2D or 3D data.
This function generates two or more N-dimensional arrays that represent the coordinates of points in a grid, based on input arrays specifying the ranges for each dimension.
* The general syntax of np.meshgrid is as follows:


> * import numpy as np
> * x_values = np.linspace(start_x, end_x, num_points_x)
> * y_values = np.linspace(start_y, end_y, num_points_y)
> * x_grid, y_grid = np.meshgrid(x_values, y_values)

In [2]:
points = np.arange(-5, 5, 0.01) # 1000 equally spaced points

In [3]:
xs, ys = np.meshgrid(points, points)
ys

array([[-5.  , -5.  , -5.  , ..., -5.  , -5.  , -5.  ],
       [-4.99, -4.99, -4.99, ..., -4.99, -4.99, -4.99],
       [-4.98, -4.98, -4.98, ..., -4.98, -4.98, -4.98],
       ...,
       [ 4.97,  4.97,  4.97, ...,  4.97,  4.97,  4.97],
       [ 4.98,  4.98,  4.98, ...,  4.98,  4.98,  4.98],
       [ 4.99,  4.99,  4.99, ...,  4.99,  4.99,  4.99]])

* Now, evaluating the function is a matter of writing the same expression you would write with two points:

In [4]:
z = np.sqrt(xs ** 2 + ys ** 2)
z

array([[7.07106781, 7.06400028, 7.05693985, ..., 7.04988652, 7.05693985,
        7.06400028],
       [7.06400028, 7.05692568, 7.04985815, ..., 7.04279774, 7.04985815,
        7.05692568],
       [7.05693985, 7.04985815, 7.04278354, ..., 7.03571603, 7.04278354,
        7.04985815],
       ...,
       [7.04988652, 7.04279774, 7.03571603, ..., 7.0286414 , 7.03571603,
        7.04279774],
       [7.05693985, 7.04985815, 7.04278354, ..., 7.03571603, 7.04278354,
        7.04985815],
       [7.06400028, 7.05692568, 7.04985815, ..., 7.04279774, 7.04985815,
        7.05692568]])

# Expressing Conditional Logic as Array Operations

* The numpy.where function is a vectorized version of the ternary expression x if condition else y.
* Suppose we had a boolean array and two arrays of values:

In [None]:
xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
cond = np.array([True, False, True, True, False])

* Suppose we wanted to take a value from xarr whenever the corresponding value in cond is True, and otherwise take the value from yarr.
* A list comprehension doing this might look like:

In [None]:
result = [(x if c else y)
          for x, y, c in zip(xarr, yarr, cond)]
result

[1.1, 2.2, 1.3, 1.4, 2.5]

* This has multiple problems.
  * First, it will not be very fast for large arrays (because all the work is being done in interpreted Python code).
  * Second, it will not work with multidimensional arrays.
  * With np.where you can write this very concisely:

In [None]:
result = np.where(cond, xarr, yarr)
result

array([1.1, 2.2, 1.3, 1.4, 2.5])

* The second and third arguments to np.where don’t need to be arrays; one or both of them can be scalars.
* A typical use of where in data analysis is to produce a new array
of values based on another array.
* Suppose you had a matrix of randomly generated data and you wanted to replace all positive values with 2 and all negative values with –2.
* This is very easy to do with np.where:

In [None]:
arr = np.random.randn(4, 4)
arr

array([[-1.06971154,  1.27167489, -1.9906885 ,  1.60061165],
       [-0.62205883,  1.29585508,  0.87986964,  1.71554189],
       [ 0.51050315, -0.80909605,  1.99821705,  0.29688139],
       [-1.72623302, -0.28945334,  0.03364243,  0.66822711]])

In [None]:
arr > 0

array([[False,  True, False,  True],
       [False,  True,  True,  True],
       [ True, False,  True,  True],
       [False, False,  True,  True]])

In [None]:
np.where(arr > 0, 2, -2)

array([[-2,  2, -2,  2],
       [-2,  2,  2,  2],
       [ 2, -2,  2,  2],
       [-2, -2,  2,  2]])

* You can combine scalars and arrays when using np.where. For example, I can replace all positive values in arr with the constant 2 like so:

In [None]:
np.where(arr > 0, 2, arr) # set only positive values to 2

array([[-1.06971154,  2.        , -1.9906885 ,  2.        ],
       [-0.62205883,  2.        ,  2.        ,  2.        ],
       [ 2.        , -0.80909605,  2.        ,  2.        ],
       [-1.72623302, -0.28945334,  2.        ,  2.        ]])

* The arrays passed to np.where can be more than just equal-sized arrays or scalars.

# Mathematical and Statistical Methods

* A set of mathematical functions that compute statistics about an entire array or about the data along an axis are accessible as methods of the array class.
* You can use aggregations (often called reductions) like sum, mean, and std (standard deviation) either by calling the array instance method or using the top-level NumPy function.

In [None]:
arr = np.random.randn(5, 4)
arr

array([[-0.9398638 , -1.01561368, -0.26971867,  0.86312326],
       [ 0.12059833,  1.02891904,  1.10927592,  0.09933233],
       [-1.65504487,  0.58845785, -0.94684019,  0.050397  ],
       [ 1.78295185,  1.52544215, -1.61855098, -0.72558678],
       [-0.15781864,  0.85228881,  0.98075464, -1.20072066]])

In [None]:
arr.mean()

0.02358914492735109

In [None]:
np.mean(arr)

0.02358914492735109

In [None]:
arr.sum()

0.4717828985470218

* Functions like mean and sum take an optional axis argument that computes the statistic over the given axis, resulting in an array with one fewer dimension:

In [None]:
arr.mean(axis=1)

array([-0.34051822,  0.58953141, -0.49075755,  0.24106406,  0.11862604])

In [None]:
arr.sum(axis=0)

array([-0.84917714,  2.97949417, -0.74507928, -0.91345485])

* Here, arr.mean(1) means “compute mean across the columns” where arr.sum(0)
means “compute sum down the rows".


---



* Other methods like **cumsum** and **cumprod** do not aggregate, instead producing an array of the intermediate results:

In [None]:
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7])
arr.cumsum()

array([ 0,  1,  3,  6, 10, 15, 21, 28])

* In **multidimensional arrays**, accumulation functions like **cumsum return an array of the same size**, but with the partial aggregates computed along the indicated axis according to each lower dimensional slice:

In [None]:
arr = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
arr

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [None]:
arr.cumsum(axis=0)

array([[ 0,  1,  2],
       [ 3,  5,  7],
       [ 9, 12, 15]])

In [None]:
arr.cumprod(axis=1)

array([[  0,   0,   0],
       [  3,  12,  60],
       [  6,  42, 336]])

* Table below shows a full list of Mathematical and Statistical Methods

<p align="center">
    <img src="http://raghudathesh.weebly.com/uploads/4/8/9/6/48968251/6_orig.png">
</p>



# Methods for Boolean Arrays

* Boolean values are forced to 1 (True) and 0 (False) in the preceding methods.
Thus, sum is often used as a means of counting True values in a boolean array:

In [None]:
arr = np.random.randn(100)
(arr > 0).sum() # Number of positive values

52

* There are two additional methods, **any** and **all**, useful especially for boolean arrays.
* any tests whether one or more values in an array is True, while all checks if every value is True:

In [None]:
bools = np.array([False, False, True, False])

In [None]:
bools.any()

True

In [None]:
bools.all()

False

# Sorting
* Like Python’s built-in list type, NumPy arrays can be sorted in-place with the sort method:

In [None]:
arr = np.random.randn(6)
arr

array([-1.04199843, -1.55289873, -0.64273652, -1.31076374,  1.28852551,
        1.46448558])

In [None]:
arr.sort()
arr

array([-1.55289873, -1.31076374, -1.04199843, -0.64273652,  1.28852551,
        1.46448558])

* You can sort each one-dimensional section of values in a multidimensional array inplace along an axis by passing the axis number to sort:

In [None]:
arr = np.random.randn(5, 3)
arr

array([[-0.6519646 , -0.36472728, -0.14988827],
       [ 0.50328909, -2.63067204, -1.85236026],
       [ 0.17278006,  0.19578855, -0.15483879],
       [-0.52880046, -0.8439432 , -0.55834065],
       [-0.58068217,  0.51008738,  2.44751151]])

In [None]:
arr.sort(1)
arr

array([[-0.6519646 , -0.36472728, -0.14988827],
       [-2.63067204, -1.85236026,  0.50328909],
       [-0.15483879,  0.17278006,  0.19578855],
       [-0.8439432 , -0.55834065, -0.52880046],
       [-0.58068217,  0.51008738,  2.44751151]])

In [None]:
arr.sort(0)
arr

array([[-2.63067204, -1.85236026, -0.52880046],
       [-0.8439432 , -0.55834065, -0.14988827],
       [-0.6519646 , -0.36472728,  0.19578855],
       [-0.58068217,  0.17278006,  0.50328909],
       [-0.15483879,  0.51008738,  2.44751151]])

* The top-level method np.sort returns a sorted copy of an array instead of modifying the array in-place.
A quick-and-dirty way to compute the quantiles of an array is to sort it and select the value at a particular rank:

# Unique and Other Set Logic

* NumPy has some basic set operations for one-dimensional ndarrays.
* A commonly used one is np.unique, which returns the sorted unique values in an array:

In [None]:
names = np.array(['Mohit', 'Varshini', 'Sunil', 'Mohit', 'Varshini', 'Sunil', 'Sunil'])
np.unique(names)

array(['Mohit', 'Sunil', 'Varshini'], dtype='<U8')

In [None]:
ints = np.array([3, 3, 3, 2, 2, 1, 1, 4, 4])
np.unique(ints)

array([1, 2, 3, 4])

* Contrast np.unique with the pure Python alternative:

In [None]:
sorted(set(names))

['Mohit', 'Sunil', 'Varshini']

* Another function, np.in1d, tests membership of the values in one array in another, returning a boolean array:

In [None]:
values = np.array([6, 0, 0, 3, 2, 5, 6])
np.in1d(values, [2, 3, 6])

array([ True, False, False,  True,  True, False,  True])

* Table below shows a listing of set functions in NumPy


<p align="center">
    <img src="http://raghudathesh.weebly.com/uploads/4/8/9/6/48968251/7_orig.png">
</p>


# File Input and Output with Arrays

* NumPy is able to save and load data to and from disk either in text or binary format.
* In this section we shall discuss NumPy’s built-in binary format, since most users will prefer pandas and other tools for loading text or tabular data.

---

* **np.save** and **np.load** are the two workhorse functions for efficiently saving and loading array data on disk.
* Arrays are saved by default in an uncompressed raw binary format with file extension .npy:

In [5]:
arr = np.arange(10)
print(arr)
np.save('some_array', arr)

[0 1 2 3 4 5 6 7 8 9]


* If the file path does not already end in .npy, the extension will be appended.
---
* The array on disk can then be loaded with np.load:

In [6]:
np.load('some_array.npy')

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

* You save multiple arrays in an uncompressed archive using np.savez and passing the arrays as keyword arguments:

In [None]:
np.savez('array_archive.npz', a=arr, b=arr)

* When loading an .npz file, you get back a dict-like object that loads the individual arrays lazily:

In [None]:
arch = np.load('array_archive.npz')
arch['b']

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

* If your data compresses well, you may wish to use numpy.savez_compressed instead:

In [None]:
np.savez_compressed('arrays_compressed.npz', a=arr, b=arr)

* If you want to removE the saved file from the disk use the command below:

In [7]:
!rm some_array.npy
!rm array_archive.npz
!rm arrays_compressed.npz

rm: cannot remove 'array_archive.npz': No such file or directory
rm: cannot remove 'arrays_compressed.npz': No such file or directory


# Linear Algebra
* Linear algebra, like matrix multiplication, decompositions, determinants, and other square matrix math, is an important part of any array library.
* Unlike some languages like MATLAB, multiplying two two-dimensional arrays with * is an element-wise product instead of a matrix dot product.
* Thus, there is a function **dot**, both an array method and a function in the numpy namespace, for matrix multiplication:

In [None]:
x = np.array([[1., 2., 3.], [4., 5., 6.]])
y = np.array([[6., 23.], [-1, 7], [8, 9]])
x

array([[1., 2., 3.],
       [4., 5., 6.]])

In [None]:
y

array([[ 6., 23.],
       [-1.,  7.],
       [ 8.,  9.]])

In [None]:
x.dot(y)

array([[ 28.,  64.],
       [ 67., 181.]])

* x.dot(y) is equivalent to np.dot(x, y):

In [None]:
np.dot(x, y)

array([[ 28.,  64.],
       [ 67., 181.]])

* A matrix product between a two-dimensional array and a suitably sized one dimensional array results in a one-dimensional array:

In [None]:
np.dot(x, np.ones(3))

array([ 6., 15.])

* The @ symbol (as of Python 3.5) also works as an infix operator that performs matrix multiplication:

In [None]:
x @ np.ones(3)

array([ 6., 15.])

* numpy.linalg has a standard set of matrix decompositions and things like inverse and determinant.
* These are implemented under the hood via the same industrystandard linear algebra libraries used in other languages like MATLAB and R, such as BLAS, LAPACK, or possibly (depending on your NumPy build) the proprietary Intel MKL (Math Kernel Library):

In [None]:
from numpy.linalg import inv, qr
X = np.random.randn(5, 5)
mat = X.T.dot(X)
inv(mat)

array([[  9.54501177,   1.33758295,  -4.77735282,  18.37882253,
         -6.43716868],
       [  1.33758295,   0.32965836,  -0.63885362,   2.74334242,
         -0.8726838 ],
       [ -4.77735282,  -0.63885362,   2.63929448,  -9.22218777,
          3.3341088 ],
       [ 18.37882253,   2.74334242,  -9.22218777,  36.06731155,
        -12.28872393],
       [ -6.43716868,  -0.8726838 ,   3.3341088 , -12.28872393,
          4.60431041]])

In [None]:
mat.dot(inv(mat))

array([[ 1.00000000e+00,  5.31325539e-15,  8.00619063e-16,
         2.64055142e-14,  8.50661463e-15],
       [ 1.22774536e-15,  1.00000000e+00,  3.14124512e-15,
        -3.80130208e-15,  7.04990670e-15],
       [ 5.95333613e-16, -7.70582404e-16,  1.00000000e+00,
        -1.17773992e-14,  1.51637982e-15],
       [-1.20621765e-14, -1.79261777e-15,  2.87494458e-15,
         1.00000000e+00,  2.77581011e-15],
       [ 1.39917847e-14,  1.84183876e-15, -9.43591572e-16,
         2.17747957e-14,  1.00000000e+00]])

In [None]:
q, r = qr(mat)
r

array([[-12.70939427,  -7.11541175,   2.71127922,   5.6598904 ,
         -6.04746394],
       [  0.        ,  -9.26414745,   2.07463553,   1.53359227,
          0.872947  ],
       [  0.        ,   0.        ,  -5.56464836,  -0.25593205,
          3.42566326],
       [  0.        ,   0.        ,   0.        ,  -0.7879716 ,
         -2.27541253],
       [  0.        ,   0.        ,   0.        ,   0.        ,
          0.06658867]])

In [None]:
q

array([[-0.83084203,  0.26348153, -0.17101349,  0.16522409, -0.42864249],
       [-0.27309338, -0.93403007, -0.13663434, -0.17596027, -0.05811085],
       [ 0.05935662,  0.16143262, -0.90495826, -0.3196509 ,  0.22201387],
       [ 0.34500666,  0.03773846, -0.00957913, -0.45810076, -0.81828977],
       [-0.33551263,  0.17515886,  0.36475229, -0.79354016,  0.3065949 ]])

* The expression X.T.dot(X) computes the dot product of X with its transpose X.T.
---
* Table below shows a list of some of the most commonly used linear algebra functions.

<p align="center">
    <img src="http://raghudathesh.weebly.com/uploads/4/8/9/6/48968251/8_orig.png">
</p>

# Pseudorandom Number Generation
* The numpy.random module supplements the built-in Python random with functions for efficiently generating whole arrays of sample values from many kinds of probability distributions.
* For example, you can get a 4 × 4 array of samples from the standard
normal distribution using normal:

In [None]:
samples = np.random.normal(size=(4, 4))
samples

array([[-1.23474622, -0.2260722 ,  0.81139174,  0.17149235],
       [ 0.3373211 , -0.2020664 , -1.11775184,  0.73648876],
       [-1.30909081, -1.14508596,  0.99653487,  1.22528468],
       [ 0.50770019, -0.70580009, -0.66642288,  0.60117039]])

* Python’s built-in random module, by contrast, only samples one value at a time. As you can see from this benchmark, numpy.random is well over an order of magnitude faster for generating very large samples:

In [None]:
from random import normalvariate
N = 1000000
%timeit samples = [normalvariate(0, 1) for _ in range(N)]

992 ms ± 349 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [None]:
%timeit np.random.normal(size=N)

32.9 ms ± 726 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


* We say that these are **pseudorandom numbers because they are generated by an algorithm with deterministic behavior based on the seed of the random number generator**.
* You can change NumPy’s random number generation seed using
np.random.seed:

In [None]:
np.random.seed(1234)

* The data generation functions in numpy.random use a global random seed.
* To avoid global state, you can use numpy.random.RandomState to create a random number generator isolated from others:

In [None]:
rng = np.random.RandomState(1234)
rng.randn(10)

array([ 0.47143516, -1.19097569,  1.43270697, -0.3126519 , -0.72058873,
        0.88716294,  0.85958841, -0.6365235 ,  0.01569637, -2.24268495])

* Table below provides a partial list of functions available in numpy.random.

<p align="center">
    <img src="http://raghudathesh.weebly.com/uploads/4/8/9/6/48968251/9_orig.png">
</p>