# Intro to NumPy and SciPy

## NumPy

- Short for Numerical Python 
- It is a fundamental library in the DS ecosystem
- It provides powerful tools for working with multidimensional arrays and matrices:
    - n-dimension arrays (ndarray): core data structure. They offer efficient storage for various data types: strings, integers, floats, booleans, etc...
    - Array Operations: it has a rich set of mathematical and statistical operations/functions
    - Indexing, slicing, merging of arrays
    - Linear Algebra
    - Random array/matrix generation

- Advantages:
    - Performance
    - Memory Efficiency
    - Interoperability: it integrates easily with different DS modules, such as Pandas, SciPy, Scikit-Learn, Matplotlib, TesnorFlow, etc..
    - It's the foundation of so many DS libraries

### Basics

`pip install numpy`

In [2]:
import numpy as np

In [3]:
#let's build our first array

arr = np.array([1,2,3])
arr

array([1, 2, 3])

In [4]:
type(arr)

numpy.ndarray

In [5]:
#convert a list into an array

my_list = [4,5,6,8,12]

arr = np.array(my_list)

print(type(my_list))
print(type(arr))

<class 'list'>
<class 'numpy.ndarray'>


### `arange()` function

In [6]:
list(range(0,20))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

In [7]:
#in numpy
np.arange(0,20)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

In [9]:
arr2 = np.arange(0, 5, 0.5) # fractional increments
arr2

array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])

In [12]:
arr2.dtype

dtype('float64')

### Speed Test Between `ndarray` and `list`

In [13]:
import time

# Define the size of the data
size = 10**6

# Create a list and NumPy array with random data
list_data = list(range(size))
numpy_array = np.arange(size)

# Perform element-wise multiplication using a loop (for list)
start_time = time.time()
for i in range(size):
    list_data[i] *= 2
end_time = time.time()
list_time = end_time - start_time

# Perform element-wise multiplication using NumPy
start_time = time.time()
numpy_array *= 2
end_time = time.time()
numpy_time = end_time - start_time

print(f"Time taken for list: {list_time} seconds")
print(f"Time taken for NumPy array: {numpy_time} seconds")


Time taken for list: 0.0642697811126709 seconds
Time taken for NumPy array: 0.0008871555328369141 seconds


### Array Dimensions

In [14]:
# 0-d array (scaler)
a0 = np.array(50)
a0.ndim

0

In [15]:
#1-d array - vector
a1 = np.array([4,6,8])
a1.ndim

1

In [16]:
#2-d array (matrix)
a2 = np.array([[10,5,7],
               [12,7,14]])

print(a2)

[[10  5  7]
 [12  7 14]]


In [17]:
a2.ndim

2

In [18]:
# get the shape of the array (2 x 3)
# (num of rows, num of cols)
a2.shape

(2, 3)

In [22]:
a2.size #gives the number of elements in the array

6

In [24]:
# 3-d array (cube)
a3 = np.array([[[1,2,3],
                [4,5,6],
                [4,5,6]],
                [[6,8,5],
                [0,5,3],
                [4,5,6]]
                ])
a3.ndim

3

![3d](https://miro.medium.com/v2/resize:fit:817/0*y04Nh3L0aSwyGaby.png)

In [25]:
a3.shape #(2 layers, 3 rows, 3 columns)

(2, 3, 3)

In [26]:
print(f'This array has {a3.shape[0]} layers, {a3.shape[1]} rows, and {a3.shape[2]} columns')

This array has 2 layers, 3 rows, and 3 columns


### Arithmetic Operations in NumPy

In [49]:
arr1 = np.array([3,3,6]) 
arr2 = np.array([4,5,10])

In [50]:
arr1 + arr2

array([ 7,  8, 16])

In [31]:
arr1 * arr2

array([12, 15, 50])

In [33]:
arr1 / arr2

array([0.75, 0.6 , 0.5 ])

### Broadcasting

The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations.

In [32]:
# scaler * arr will use broadcasting
50 * arr1

array([150, 150, 250])

![bc](https://numpy.org/doc/stable/_images/broadcasting_1.png)

In [35]:
a = np.array([[ 0.0,  0.0,  0.0],
              [10.0, 10.0, 10.0],
              [20.0, 20.0, 20.0],
              [30.0, 30.0, 30.0]])
b = np.array([1.0, 2.0, 3.0])

a + b

array([[ 1.,  2.,  3.],
       [11., 12., 13.],
       [21., 22., 23.],
       [31., 32., 33.]])

![bc2](https://numpy.org/doc/stable/_images/broadcasting_2.png)

In [38]:
b_ver = np.array([[1.0, 2.0, 3.0]]).reshape(3,1)
b_ver

array([[1.],
       [2.],
       [3.]])

In [42]:
a = np.array([[ 0.0,  0.0,  0.0],
              [10.0, 10.0, 10.0],
              [20.0, 20.0, 20.0],
              [30.0, 30.0, 30.0]])
b = np.array([1.0, 2.0, 3.0, 5.7])

a + b

ValueError: operands could not be broadcast together with shapes (4,3) (4,) 

![br3](https://numpy.org/doc/stable/_images/broadcasting_3.png)

### NumPy Functions

In [58]:
arr = np.arange(1,10)
arr

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [59]:
print(arr.reshape(3,3))

[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [61]:
my_arr_dim = arr.reshape(3,3).ndim
my_arr_dim

2

In [62]:
arr_bef = np.arange(1,13)

arr_bef

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [63]:
arr_aft = arr_bef.reshape(2,2,3) # layer, row, col

In [64]:
print('Before\n',arr_bef)
print('After\n',arr_aft)

Before
 [ 1  2  3  4  5  6  7  8  9 10 11 12]
After
 [[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]


In [65]:
# convert back into 1d without specifying the dimensions (-1)
# flatten the array
arr_aft.reshape(-1)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [66]:
#mod 
10 % 2

0

In [68]:
arr1 = np.array([10,20,17]) 
arr2 = np.array([2,5,2])

np.mod(arr1,arr2)

array([0, 0, 1])

In [69]:
np.add(arr1,arr2)

array([12, 25, 19])

### String/Text Arrays

In [71]:
a = np.array(['a', 'b', 'c'])
a

array(['a', 'b', 'c'], dtype='<U1')

In [73]:
# using the add function for string
a = np.array(['Hello', 'Welcome'])
b = np.array([' Learners', ' To The Class'])

np.char.add(a,b) #to run functions text you need to use 'char' sub module

array(['Hello Learners', 'Welcome To The Class'], dtype='<U20')

In [74]:
# concat function will merge 2 lists
np.concatenate((a,b))

array(['Hello', 'Welcome', ' Learners', ' To The Class'], dtype='<U13')

In [76]:
# concat does the same for numbers - similar to union in SQL
arr1 = np.array([[10,20],
                 [4,9]])

arr2 = np.array([[5,7]])

np.concatenate((arr1,arr2), axis=0)


array([[10, 20],
       [ 4,  9],
       [ 5,  7]])

In [77]:
arr1 = np.array([[10,20],
                 [4,9]])

arr2 = np.array([[8,9],
                 [7,3]])

np.concatenate((arr1,arr2), axis=1) # merge the 2 arrays horizontally (side by side)


array([[10, 20,  8,  9],
       [ 4,  9,  7,  3]])

In [78]:
arr1 = np.array([[10,20, 5, 5],
                 [4, 9, 2, 3]])
arr2 = np.array([[8,9],
                 [7,3]])
np.concatenate((arr1,arr2), axis=1)

array([[10, 20,  5,  5,  8,  9],
       [ 4,  9,  2,  3,  7,  3]])

In [80]:
a = np.array(['Hello', 'Welcome', 'HELLO', 'hello'])
np.char.upper(a)

array(['HELLO', 'WELCOME', 'HELLO', 'HELLO'], dtype='<U7')

In [82]:
arr2 = np.array(['Model A', 'Model B', 'Model C'])

np.char.replace(arr2,'Model','Design')

array(['Design A', 'Design B', 'Design C'], dtype='<U8')

### Statistical Functions in NumPy

In [83]:
arr = np.arange(1,20)
arr

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19])

In [84]:
# get the avg or mean
np.mean(arr)

10.0

In [88]:
a = np.array([[ 0.0,  0.0,  0.0],
              [10.0, 10.0, 6.0],
              [20.0, 3.0, 43.0],
              [30.0, 9.0, 30.0]])

In [89]:
np.mean(a)

13.416666666666666

![ax](https://www.sharpsightlabs.com/wp-content/uploads/2018/12/numpy-arrays-have-axes_updated_v2.png)

In [90]:
# average by column
np.mean(a, axis=0)

array([15.  ,  5.5 , 19.75])

In [91]:
#average by row
np.mean(a, axis=1)

array([ 0.        ,  8.66666667, 22.        , 23.        ])

In [92]:
# 3-d array (cube)
a3 = np.array([[[1,2,3],
                [4,5,6],
                [4,5,6]],
                [[6,8,5],
                [0,5,3],
                [4,5,6]]
                ])

np.mean(a3, axis=2)

array([[2.        , 5.        , 5.        ],
       [6.33333333, 2.66666667, 5.        ]])

In [93]:
np.median(a)

9.5

In [94]:
np.std(a) #standard deviation

13.585889330069229

In [95]:
np.max(a,axis=0)

array([30., 10., 43.])

### Filtering in NumPy

In [97]:
arr = np.array([5,6,7,2,3,22,12,14,11,6,23])

# simple filtering with a condition bracket

filtered_array = arr[arr<13]
filtered_array

array([ 5,  6,  7,  2,  3, 12, 11,  6])

In [98]:
# using multiple conditions
filtered_array = arr[(arr<13)&(arr>3)]
filtered_array

array([ 5,  6,  7, 12, 11,  6])

In [100]:
mask = np.where(arr>3)

filtered_array = arr[mask]
filtered_array

array([ 5,  6,  7, 22, 12, 14, 11,  6, 23])

In [101]:
np.where(arr>3, True, False)

array([ True,  True,  True, False, False,  True,  True,  True,  True,
        True,  True])

In [102]:
np.where(arr>3, 'more than 3', 'less than 3')

array(['more than 3', 'more than 3', 'more than 3', 'less than 3',
       'less than 3', 'more than 3', 'more than 3', 'more than 3',
       'more than 3', 'more than 3', 'more than 3'], dtype='<U11')

### Slicing and Dicing Arrays Using Indices

![tag](https://www.oreilly.com/api/v2/epubs/9781449323592/files/httpatomoreillycomsourceoreillyimages2172112.png)

In [104]:
a = np.array([[ 4,  8,  0],
              [10,  10, 6],
              [20,  3,  43]
              ])

In [105]:
a[0,1] # [row pos, col pos]

8.0

In [106]:
a[2,2]

43.0

In [107]:
a[1]

array([10., 10.,  6.])

In [109]:
# get the first 2 rows
# ranges
a[0:2] # start from index pos 0 and stop right before pos 2

array([[ 4.,  8.,  0.],
       [10., 10.,  6.]])

In [113]:
a[:2] # start from index pos 0 and stop right before pos 2

array([[ 4.,  8.,  0.],
       [10., 10.,  6.]])

In [111]:
a[1:2]

array([[10., 10.,  6.]])

In [112]:
# grab the last row of the array
a[-1]

array([20.,  3., 43.])

In [114]:
# ranges. slicing for rows and columns

a = np.array([[ 0.0,  0.0,  0.0],
              [10.0, 10.0, 6.0],
              [20.0, 3.0, 43.0],
              [30.0, 9.0, 30.0]])

In [115]:
a[2:,1:]

array([[ 3., 43.],
       [ 9., 30.]])

In [117]:
a[2,1] + a[2,2]

46.0

In [119]:
a[2:,:2]

array([[20.,  3.],
       [30.,  9.]])

In [120]:
# convert the last row of the array to a list
a[-1].tolist()

[30.0, 9.0, 30.0]

In [122]:
print(type(a[-1]))
print(type(a[-1].tolist()))

<class 'numpy.ndarray'>
<class 'list'>


In [123]:
a3 = np.array([[[1,2,3],
                [4,5,6],
                [4,5,6]],
                [[6,8,5],
                [0,5,3],
                [4,5,6]]
                ])

a3

array([[[1, 2, 3],
        [4, 5, 6],
        [4, 5, 6]],

       [[6, 8, 5],
        [0, 5, 3],
        [4, 5, 6]]])

In [124]:
a3[1,:2, 1:]

array([[8, 5],
       [5, 3]])

In [135]:
np.concatenate(([a3[0,-1]],[a3[1,0]]),axis=0)

array([[4, 5, 6],
       [6, 8, 5]])

In [130]:
a3[1,0]

array([6, 8, 5])

In [131]:
a3[0,-1]

array([4, 5, 6])

### Generating Data in NumPy

In [9]:
# using randint
np.random.randint(1,101,size=40)

array([61, 64, 84, 48, 59, 49, 70, 98, 19, 35, 60, 72,  3,  4, 77, 24, 93,
       59, 98, 72, 51, 51, 95, 95, 56, 45, 24, 21, 95, 11, 73, 97, 59, 73,
        3, 27, 49, 71, 22, 30])

In [11]:
# generate evenly spaced values using linspace
np.linspace(0,1,20, retstep=True)

(array([0.        , 0.05263158, 0.10526316, 0.15789474, 0.21052632,
        0.26315789, 0.31578947, 0.36842105, 0.42105263, 0.47368421,
        0.52631579, 0.57894737, 0.63157895, 0.68421053, 0.73684211,
        0.78947368, 0.84210526, 0.89473684, 0.94736842, 1.        ]),
 0.05263157894736842)

In [12]:
np.linspace(2,8,4, retstep=True)

(array([2., 4., 6., 8.]), 2.0)

In [14]:
arr = np.random.randint(1,101,(10,10))
arr

array([[ 39,   2,  38,  66,   1,  64,  50,   3,  99,  23],
       [ 42,  54,  99,  52,  64,   9,  22,  10,  13,  58],
       [ 30,  70,   2,  20, 100,  81,  12,  50,  71,  41],
       [ 54,  61,  95,  77,  73,  81,  58,   9,  29,   9],
       [  1,  51,   2,  29,  61,  39,  54,  89,  75,  11],
       [ 50,  97,  40,  74,  97,  60,  45,  94,  75,  23],
       [ 80,  38,  45,  47,  77, 100,   8,  43,   7,  95],
       [ 56,  52,  32, 100,  66,  46,  22,  26,  78,  97],
       [ 13,  51,  88,  30,  46,  55,  78,  86,   8,   3],
       [ 46,  65,  47,  16,  62,  56,  61,  12,  19,  20]])

In [18]:
#using list comprehension
arr = np.array([x**2 for x in range(12)])
arr

array([  0,   1,   4,   9,  16,  25,  36,  49,  64,  81, 100, 121])

### Import Data In NumPy

In [35]:
data = np.loadtxt('dummy_data.txt', delimiter=',')
data

array([[ 5.,  1.],
       [ 5.,  3.],
       [ 9.,  5.],
       [ 5.,  7.],
       [ 8.,  8.],
       [ 5.,  9.],
       [ 6.,  0.],
       [ 5., 11.],
       [ 4.,  5.]])

# SciPy

- Scientific Python
- Powerful scientific and engineering library built on top of NumPy
- It provides a wide range of applications, such as:
    - Optimization: in calculous finding the min and max of functions
    - Integration
    - Linear Algebra
    - Differentiation
    - Clustering Algorithms
    - Signal Processing
    - Statistics: prob distribution, hypothesis testing, etc...
    - Image Processing

- `pip install scipy`

In [19]:
from scipy import constants # offers all mathematical and physics constants

In [20]:
dir(constants)

['Avogadro',
 'Boltzmann',
 'Btu',
 'Btu_IT',
 'Btu_th',
 'G',
 'Julian_year',
 'N_A',
 'Planck',
 'R',
 'Rydberg',
 'Stefan_Boltzmann',
 'Wien',
 '__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '_codata',
 '_constants',
 '_obsolete_constants',
 'acre',
 'alpha',
 'angstrom',
 'arcmin',
 'arcminute',
 'arcsec',
 'arcsecond',
 'astronomical_unit',
 'atm',
 'atmosphere',
 'atomic_mass',
 'atto',
 'au',
 'bar',
 'barrel',
 'bbl',
 'blob',
 'c',
 'calorie',
 'calorie_IT',
 'calorie_th',
 'carat',
 'centi',
 'codata',
 'constants',
 'convert_temperature',
 'day',
 'deci',
 'degree',
 'degree_Fahrenheit',
 'deka',
 'dyn',
 'dyne',
 'e',
 'eV',
 'electron_mass',
 'electron_volt',
 'elementary_charge',
 'epsilon_0',
 'erg',
 'exa',
 'exbi',
 'femto',
 'fermi',
 'find',
 'fine_structure',
 'fluid_ounce',
 'fluid_ounce_US',
 'fluid_ounce_imp',
 'foot',
 'g',
 'gallon',
 'gallon_US',
 'gallon_imp',
 'gas_co

In [21]:
constants.pi

3.141592653589793

In [22]:
constants.metric_ton

1000.0

In [24]:
x = constants.hp

final_hp = x *10
final_hp

7456.998715822701

In [25]:
from scipy import stats, mean #importing 2 submodule from scipy 

In [26]:
x = [1,2,4,6,7,8,8,9,9]
mean(x)

  mean(x)


6.0

In [33]:
# mean is deprecated for recent versions  - use numpy instead
np.mean(x)

6.0

In [27]:
# geometric mean
stats.gmean(x)

4.936793014856867

In [28]:
stats.skew(x)

-0.6187184335382291

In [30]:
# generate normal dist
sample = stats.norm(0,1).rvs(1000)
sample

array([ 1.30413760e+00, -3.46652787e-02,  9.20150991e-01, -1.45240120e+00,
        1.75379197e+00, -1.51213966e+00, -7.93906320e-01,  1.45178182e+00,
        3.35745168e-01, -4.90789436e-01,  3.41640507e-01, -1.93148902e+00,
        1.04826102e+00, -2.04897267e-01, -1.62884870e+00,  1.22193281e+00,
        1.39032387e+00,  1.04436041e+00, -1.04639188e+00, -6.34202563e-01,
       -6.32304487e-01, -2.19456109e-01, -2.25297493e-01, -7.37230070e-01,
       -5.33135336e-01, -2.14187409e+00, -4.88774807e-01,  1.76265242e+00,
       -1.40120236e+00,  5.96548126e-02,  6.94897981e-03, -4.35989925e-01,
        8.89878146e-01, -1.41089443e+00, -1.61311467e-01, -5.94408208e-01,
        2.12157918e+00,  1.25441241e+00,  8.40706867e-01, -7.79550626e-01,
       -2.59186997e-01, -2.38929042e+00,  1.75930155e-01, -1.29021735e+00,
        3.86891335e-01,  1.02828185e-01, -4.45801671e-01,  2.09327730e+00,
        1.05385094e+00,  3.34869615e-01,  3.67711348e-01,  3.18398680e-01,
       -9.00029640e-01, -

**exercise** Get the derivative of the following:
$$ 3x^2 + 9 x -5 $$

In [31]:
from scipy.misc import derivative

# define a function to get the derivative of a num in the eq
def eqn(x):
    return 3 * x**2 + 9*x - 5


In [32]:
derivative(eqn, 1)

  derivative(eqn, 1)


15.0