## Basic NumPy Functionality

### NumPy n-dim array
* Part of a Data "ecosystem" which includes data science libaries, machine learning and image processing
* NumPy is huge and very powerful --> we will just hit the highlights
* optimized array, superior to lists
* can be "any" number of dimensions, typically 1 or 2 dims though
* restricted to a single data type
* required to "dimension" your arrays when you initialize them 

In [2]:
import numpy as np

In [4]:
l = [1, 2, 3] 
a = np.array([1,2,3])

In [8]:
np.__version__

'1.26.4'

In [12]:
l

[1, 2, 3]

In [14]:
a

array([1, 2, 3])

In [18]:
l + [1]

[1, 2, 3, 1]

In [22]:
a + a

array([2, 4, 6])

### Performance

In [26]:
big_array = np.arange(4.5e6)
big_list = big_array.tolist()

In [34]:
%timeit -n10 square = [x ** 2 for x in big_list]

591 ms ± 3.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [38]:
%timeit -n10 square = big_array ** 2 

10.2 ms ± 418 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)


### Determining dimensions
* size = "rows" * "columns"
* itemsize --> number of bytes used by a single value
* itemsize * size = bytes of memory the array uses
* shape --> comma separated tuple of length of each dimension
* ndim --> int value of dimensions present
* reshape --> dimension manipulation

In [52]:
arr = np.arange(15)
coll = np.arange(1,16)
arr
coll

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

In [44]:
arr.size


15

In [46]:
len(arr)

15

In [48]:
arr.itemsize

4

In [54]:
big_array.size * big_array.itemsize

36000000

In [68]:
arr = arr.reshape(3,5)

In [70]:
arr.shape

(3, 5)

In [76]:
arra = np.arange(27)

In [86]:
arra = arra.reshape(3,3,3)

In [88]:
arra.ndim

3

In [90]:
arra.shape

(3, 3, 3)

In [98]:
arrg= arr.reshape(3,5)
arrg

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [100]:
arrg.reshape(-1)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

### "Slicing"

In [114]:
arr[2][4]

14

In [118]:
arr[:,0]

array([ 0,  5, 10])

In [120]:
arr[0,:]

array([0, 1, 2, 3, 4])

In [124]:
arr[:2]

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [130]:
arr[1:3]

array([[ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

### Special arrays
* useful for dimensioning an array before you aknow what values it will hold

In [138]:
np.zeros(15).reshape(3,5)

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [140]:
np.ones(20).reshape(4,5)

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [148]:
np.eye(6,5, k=2)

array([[0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

### Working with random numbers
* many different distribution options
* normal and uniform most common
* normal, standard_normal
* randn, randint

### Standard normal
* mean 0
* std 1
* takes a single optional argument --> n
* `np.random.standard_normal([size])`

In [152]:
from numpy import random as npr

In [154]:
npr.standard_normal(15)

array([-0.13346826, -0.5200968 , -0.81729832,  0.89829368, -0.51348816,
        0.07349558, -0.50764369,  0.21228741, -0.20513472, -0.28022506,
        1.69080954,  1.91998026, -0.97533575,  0.94606427, -1.76005503])

### Uniform Distributions
* `np.random.rand(size=None)`
* `np.random.randint(low, high=None, size=None, dtype=int)`
    * excludes high --> called a half-open interval

In [237]:
npr.rand(3)

array([0.28282685, 0.02525749, 0.99453711])

In [186]:
npr.randint(1,6)

1

In [4]:
# notice we never get to 6...


### Normal
* loc = mean
* scale = std
* if not specified mean = 0, std = 1

In [520]:
x = npr.normal(loc=10, scale=5, size = 100)

In [522]:
x

array([ 5.54806558,  3.45132194, 12.39247371,  7.98246583,  6.13320566,
       16.64132442, 16.30749735,  6.31219501,  4.15949095, 24.03148737,
        3.67979669,  7.61481082, 11.23947141, 10.14811585,  7.83763877,
       12.229815  , 11.98919957, -3.69866276, 12.44121956,  4.09408377,
        9.5181509 , 13.2155825 ,  0.4351285 , 13.20785653, 11.40225977,
       18.13083464,  5.76265831, 14.18683404, 16.72553878, 10.56818488,
        7.84144356,  7.79074722, 10.81061446,  3.27933798, 19.6799218 ,
       12.89104947,  8.53923524, 12.59076194, 16.11542789, 11.70387626,
       17.19619368, 11.55994548, 13.06794396,  8.9174031 , 14.30086751,
       12.47422785, 14.90399122,  7.47555249,  0.06408367,  5.24062514,
       11.40858844,  9.08520636, 10.67647124,  3.23269322, 13.65703179,
        6.89095471, 13.01748186, 16.15901827, 13.11951059, 17.10937351,
        7.60069688,  6.08749487, 15.18430415,  4.92910218, 12.04416807,
       12.16236879, 11.24631976, 14.13309454,  3.19888941,  6.80

In [524]:
print(x.mean())
print(x.std())
print(x.max())
print(x.min())
# there is no median method

10.183517231561826
4.608432671141952
24.03148737102095
-3.698662756910327


### Simple Linear Modeling

In [532]:
x = npr.normal(size = 50)
y = x + npr.normal(loc = 10, scale = 2, size = 50)

In [534]:
np.corrcoef(x,y)

array([[1.        , 0.40751276],
       [0.40751276, 1.        ]])

In [544]:
slope, intercept = np.polyfit(x,y, deg=1)
print(slope)
print(intercept)

0.7911241567610863
10.395009669774094


### More syntactically intense method to do the same thing
* Requires a "dummy" variable next to actual x variable
* The rcond allows for special handling of extremely small values 

### Fill array with uniformly spaced data points between a minimum and maximum

In [546]:
np.linspace(-4,4, 100)

array([-4.        , -3.91919192, -3.83838384, -3.75757576, -3.67676768,
       -3.5959596 , -3.51515152, -3.43434343, -3.35353535, -3.27272727,
       -3.19191919, -3.11111111, -3.03030303, -2.94949495, -2.86868687,
       -2.78787879, -2.70707071, -2.62626263, -2.54545455, -2.46464646,
       -2.38383838, -2.3030303 , -2.22222222, -2.14141414, -2.06060606,
       -1.97979798, -1.8989899 , -1.81818182, -1.73737374, -1.65656566,
       -1.57575758, -1.49494949, -1.41414141, -1.33333333, -1.25252525,
       -1.17171717, -1.09090909, -1.01010101, -0.92929293, -0.84848485,
       -0.76767677, -0.68686869, -0.60606061, -0.52525253, -0.44444444,
       -0.36363636, -0.28282828, -0.2020202 , -0.12121212, -0.04040404,
        0.04040404,  0.12121212,  0.2020202 ,  0.28282828,  0.36363636,
        0.44444444,  0.52525253,  0.60606061,  0.68686869,  0.76767677,
        0.84848485,  0.92929293,  1.01010101,  1.09090909,  1.17171717,
        1.25252525,  1.33333333,  1.41414141,  1.49494949,  1.57