# The Numpy.random package

***


1. Explain the overall purpose of the package.
2. Explain the use of the “Simple random data” and “Permutations” functions.
3. Explain the use and purpose of at least five “Distributions” functions.
4. Explain the use of seeds in generating pseudorandom numbers

NumPy (Numerical Python) is an open source Python library that’s used in almost every field of science and engineering. It’s the universal standard for working with numerical data in Python, and it’s at the core of the scientific Python and PyData ecosystems. NumPy users include everyone from beginning coders to experienced researchers doing state-of-the-art scientific and industrial research and development. The NumPy API is used extensively in Pandas, SciPy, Matplotlib, scikit-learn, scikit-image and most other data science and scientific Python package.

To access NumPy and its functions it can be imported into your Python code like this:

import numpy as np

The imported name is shortened to np for better readability of code using NumPy. This is a widely adopted convention that you should follow so that anyone working with your code can easily understand it.

#### Arrays in numpy
While a Python list can contain different data types within a single list, all of the elements in a NumPy array should be homogeneous.
NumPy arrays are faster and more compact than Python lists. An array consumes less memory and is convenient to use. NumPy uses much less memory to store data and it provides a mechanism of specifying the data types. This allows the code to be optimized even further.

#### what is an array in numpy
An array is a central data structure of the NumPy library. An array is a grid of values and it contains information about the raw data, how to locate an element, and how to interpret an element. It has a grid of elements that can be indexed in various ways. The elements are all of the same type, referred to as the array dtype.

An array can be indexed by a tuple of nonnegative integers, by booleans, by another array, or by integers. The rank of the array is the number of dimensions. The shape of the array is a tuple of integers giving the size of the array along each dimension.

[NumPy for absolute beginners](https://numpy.org/doc/stable/user/absolute_beginners.html)

#### examples of arrays

In [1]:
import numpy as np

In [2]:
a = np.array([1, 2, 3])

In [3]:
a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

We can access the elements in the array using square brackets. When you’re accessing elements, remember that indexing in NumPy starts at 0. That means that if you want to access the first element in your array, you’ll be accessing element “0”.

In [4]:
print(a[1])

[5 6 7 8]


we can define an array without manually creating it

In [5]:
# works with 'ones' also
np.zeros(11)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [6]:
#create an array with randon floats of a specified number 
#randomised- restart kernel to create new random numbers#
np.empty(5)

array([2.12199579e-314, 2.33644299e-307, 5.65211099e-321, 2.61005483e-312,
       3.22647083e-307])

In [7]:
# create an array with 40 elements
np.arange(40)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39])

In [8]:
# specify the first number, last number, and the step size
np.arange(2,33,2)

array([ 2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32])

In [9]:
#use np.linspace() to create an array with values that are spaced linearly in a specified interval
np.linspace(2,100, num=5)

array([  2. ,  26.5,  51. ,  75.5, 100. ])

In [10]:
#default data type is floating point (np.float64) but if want e.g integer
x = np.ones(20, dtype=np.int64)
x
#e.g. no decimal points

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
      dtype=int64)

sorting an array

In [11]:
# array numbers are not sorted
arr = np.array([2, 1, 5, 3, 7, 4, 6, 8])
arr

array([2, 1, 5, 3, 7, 4, 6, 8])

In [12]:
#array numbers are sorted by size
arr = np.array([2, 1, 5, 3, 7, 4, 6, 8])
np.sort(arr)

array([1, 2, 3, 4, 5, 6, 7, 8])

 also look-up: <br>
*argsort*, which is an indirect sort along a specified axis, <br>
*lexsort*, which is an indirect stable sort on multiple keys,<br>
*searchsorted*, which will find elements in a sorted array,<br>
*partition*, which is a partial sort<br>

#### concatenation of arrays

In [13]:
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8, 9])
np.concatenate ((a,b))

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [14]:
#remember the double brackets
#new array lists elements in order called
np.concatenate ((b,a))

array([5, 6, 7, 8, 9, 1, 2, 3, 4])

In [15]:
# can concatenate one of the two arrays x times
np.concatenate ((b,b))

array([5, 6, 7, 8, 9, 5, 6, 7, 8, 9])

In [16]:
# concatenate two arrays of different dimensions
x = np.array([[1, 2], [3, 4]])
y = np.array([[5, 6]])
np.concatenate ((x,y)) #, axis =0)

array([[1, 2],
       [3, 4],
       [5, 6]])

#### Dimensional arrays
An N-dimensional array is simply an array with any number of dimensions.<br> A vector is an array with a single dimension (there’s no difference between row and column vectors), while a matrix refers to an array with two dimensions. 

In [17]:
#dimensions within square brackets
#see below each array in square brackets is bracketed again- three of these
array_x = np.array([[[0, 1, 2, 3],
                          [4, 5, 6, 7]], 
                    
                    [[0 ,1 ,2, 3],
                     [4, 5, 6, 7]],
                    
                          [[0, 1, 2, 3],
                           [4, 5, 6, 7]]])

In [18]:
array_x.ndim

3

In [19]:
array_x.size

24

In [20]:
# rows, columns
#dimensions, rows, columns
array_x.shape

(3, 2, 4)

reshape an array

In [21]:
c= np.array([1, 2, 3,4,5,6])
b = c.reshape(3, 2)
b

array([[1, 2],
       [3, 4],
       [5, 6]])

<br>

#### Indexing & slicing (same as Python)

In [22]:
# NumPy & Python count elements as: 0,1,2,3,4,5
data = np.array([1, 2, 3, 4, 5, 6])

In [23]:
data[1]

2

In [24]:
data[1:4]

array([2, 3, 4])

In [25]:
data[3:]

array([4, 5, 6])

In [26]:
data[-2:]

array([5, 6])

In [27]:
#print all the values in the array <5
a = np.array([[1 , 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print(a[a < 5])

[1 2 3 4]


In [28]:
#write a function 'greater than four'
greater_than_four = (a >=5)
print(a[greater_than_four])

[ 5  6  7  8  9 10 11 12]


In [29]:
#write a function 'greater than four' returns a boolean
greater_than_four = (a >=5)
print([greater_than_four])

[array([[False, False, False, False],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True]])]


In [30]:
divisible_by_2 = a[a%2==0]
print([divisible_by_2])

[array([ 2,  4,  6,  8, 10, 12])]


In [31]:
#apply two conditions
c = a[(a > 3) & (a < 9)]
print(c)

[4 5 6 7 8]


In [32]:
# imagine the [] of a stacked on top of each other
# rows and columns then signify locations startign with row 0 and column 0 top left of dataframe
b = np.nonzero(a < 5)
print(b)

(array([0, 0, 0, 0], dtype=int64), array([0, 1, 2, 3], dtype=int64))


<br>

#### create an array from exisitng array

In [33]:
# rcreate a new array from position 3 to 8 
# array numbering starts at zero
# element in position 3 is included , elmeent at position 8 is excluded
a = np.array([1,  2,  3,  4,  5,  6,  7,  8,  9, 10])
arr1 = a[3:8]
arr1

array([4, 5, 6, 7, 8])

##### stacking arrays <br>

In [34]:
# v stack = vertical stack
#Note: a1 = np.array  as below is the same i.e [2,2] on next line
#              ([[1, 1],
#                 [2, 2]])
a1 = np.array([[1, 1],[2, 2]])

a2 = np.array([[3, 3],[4, 4]])

np.vstack((a1, a2))

array([[1, 1],
       [2, 2],
       [3, 3],
       [4, 4]])

In [35]:
# or can stack them horizontally so that 1st row of 1st array then 1st row of 2nd array etc
np.hstack((a1, a2))

array([[1, 1, 3, 3],
       [2, 2, 4, 4]])

In [36]:
# create two arrays using elements 1-24
x = np.arange(1, 25).reshape(2, 12)
x

array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12],
       [13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]])

In [37]:
#reshape: split into three dimensions
np.hsplit(x, 3)

[array([[ 1,  2,  3,  4],
        [13, 14, 15, 16]]),
 array([[ 5,  6,  7,  8],
        [17, 18, 19, 20]]),
 array([[ 9, 10, 11, 12],
        [21, 22, 23, 24]])]

In [38]:
#array_x.ndim
array_x.shape
#array_x.size

(3, 2, 4)

In [39]:
# split the two arrays in x after 3rd and 4th column:
np.hsplit(x, (3, 4))

[array([[ 1,  2,  3],
        [13, 14, 15]]),
 array([[ 4],
        [16]]),
 array([[ 5,  6,  7,  8,  9, 10, 11, 12],
        [17, 18, 19, 20, 21, 22, 23, 24]])]

In [40]:
# split the two arrays in x after 3rd,4th and 5th column:
np.hsplit(x, (3, 4, 5))

[array([[ 1,  2,  3],
        [13, 14, 15]]),
 array([[ 4],
        [16]]),
 array([[ 5],
        [17]]),
 array([[ 6,  7,  8,  9, 10, 11, 12],
        [18, 19, 20, 21, 22, 23, 24]])]

In [41]:
# split the two arrays  in x  after 3rd to 11th column:
np.hsplit(x, (2,3,4,5,6,7,8,9,10,11))

[array([[ 1,  2],
        [13, 14]]),
 array([[ 3],
        [15]]),
 array([[ 4],
        [16]]),
 array([[ 5],
        [17]]),
 array([[ 6],
        [18]]),
 array([[ 7],
        [19]]),
 array([[ 8],
        [20]]),
 array([[ 9],
        [21]]),
 array([[10],
        [22]]),
 array([[11],
        [23]]),
 array([[12],
        [24]])]

In [42]:
ax = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
b1 = ax[1, :]
b1

array([5, 6, 7, 8])

In [43]:
b1[1:3] = 99, 100

In [44]:
b1

array([  5,  99, 100,   8])

In [45]:
#note original array has changed too
#could use copy i.e. b2 = a.copy() and work from there
ax

array([[  1,   2,   3,   4],
       [  5,  99, 100,   8],
       [  9,  10,  11,  12]])

#### add and subtract array elements

In [66]:
# 'data' = [1,2] , ones = [1,1]
#dtype int = data type integer otherwise ones will create float numbers
#arrays have to have same number of elements
data = np.array([1, 2])
ones = np.ones(2, dtype=int)
data + ones

array([2, 3])

In [47]:
data - ones

array([0, 1])

In [48]:
data * data

array([1, 4])

In [49]:
data / data

array([1., 1.])

In [73]:
a = np.array([1, 2, 3, 4])
a.sum()

10

In [74]:
# prodct of array i.e.1*2*3*4=24 
a.prod()

24

In [75]:
a.max()

4

In [76]:
a.mean()

2.5

24

In [83]:
# three rows of data each containing 3 entriels i.e. 4 columns
# Axis 0 = looking down vertically through the elements
# axis 1 = looking horizontally across the elements
#   C1 C2 C3 C4
#R1 1  1  5  8
#R2 2  2  6  10
#R3 2  5  6  8
b = np.array([[1, 1,5,8], [2, 2, 6, 10], [2, 5, 6, 8]])
b.sum(axis=0)

array([ 5,  8, 17, 26])

In [82]:
b.sum(axis=1)

array([15, 20, 21])

#### broadcasting

NumPy understands that the multiplication should happen with each cell.<br> That concept is called broadcasting. Broadcasting is a mechanism that allows NumPy to perform operations on arrays of different shapes. <br> The dimensions of your array must be compatible, <br> for example, when the dimensions of both arrays are equal or when one of them is 1. <br> If the dimensions are not compatible, you will get a ValueError<br> 

In [71]:
# e.g. convert all array elelemts from miles to km
miles = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
miles * 1.6

array([[ 1.6,  3.2,  4.8,  6.4],
       [ 8. ,  9.6, 11.2, 12.8],
       [14.4, 16. , 17.6, 19.2]])

In [77]:
# whats the min looking top down
miles.min(axis =0)

array([1, 2, 3, 4])

In [78]:
# whats the min looking across
miles.min(axis =1)

array([1, 5, 9])

<br>

#### creating matrices from arrays
https://www.programiz.com/python-programming/matrix

[NumPy Random Number Generator](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.integers.html#numpy.random.Generator.integers)

In [54]:
#rng is random number generator
rng = np.random.default_rng(12345)
print(rng)

Generator(PCG64)


In [55]:
rng = np.random.default_rng()
rng.integers(1, size=10)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int64)

In [56]:
rng = np.random.default_rng()
rng.integers(12, size=10)

array([ 8,  5,  6,  1,  1, 10,  7,  2,  2,  5], dtype=int64)

In [57]:
rng = np.random.default_rng(12345)
rints = rng.integers(low=0, high=10, size=3)
rints


array([6, 2, 7], dtype=int64)

In [58]:
rng.integers(5, size=(2, 4))

array([[1, 1, 3, 3],
       [3, 4, 1, 4]], dtype=int64)

In [59]:
rng = np.random.default_rng(seed=42)
arr2 = rng.random((3, 3))
arr2
# run it again and you get the seame 'random' values in the array


array([[0.77395605, 0.43887844, 0.85859792],
       [0.69736803, 0.09417735, 0.97562235],
       [0.7611397 , 0.78606431, 0.12811363]])

In [60]:
rng.integers(100, size=100)

array([83, 45, 50, 37, 18, 92, 78, 64, 40, 82, 54, 44, 45, 22,  9, 55, 88,
        6, 85, 82, 27, 63, 16, 75, 70, 35,  6, 97, 44, 89, 67, 77, 75, 19,
       36, 46, 49,  4, 54, 15, 74, 68, 92, 74, 36, 96, 41, 32, 90, 37,  7,
       46, 79, 18, 46, 12, 68, 47, 33, 22, 56, 66, 94, 43, 16, 83, 62, 70,
        9, 31, 76, 83, 43, 80, 84, 38, 89, 28, 23, 68, 63, 13, 83, 19, 80,
        0, 79, 78, 78, 66, 47, 70, 27, 78, 55, 45, 50, 56,  3, 13],
      dtype=int64)

In [61]:
x = rng.integers(100, size=10000)

In [62]:
import matplotlib.pyplot as plt

In [63]:
matplotlib inline # magic command if histogram dpesnt show
plt.hist(x)
plt.show() # may not be needed but again insert if not showing in notebook

SyntaxError: invalid syntax (<ipython-input-63-c72bf3f90754>, line 1)

[essential tips for writing with Jupyter Notebooks](https://towardsdatascience.com/7-essential-tips-for-writing-with-jupyter-notebook-60972a1a8901)