# Vectorisation
- using operations that act on entire arrays or sequences of data without the explicit use of looping constructs. Save time as underlying operation often based on lower level language.
- often use numpy array and pandas dataframe (pandas is built on top of numpy)

# Numpy

- Universal Functions (ufuncs): NumPy functions that operate on numpy array
- if a = np.array([])
- ufuncs x --> np.x(a)

# Common ufuncs: 
- math operations
    - arithmetic: add, subtract, multiply, divide, power, mod, absolute
    - trigo: sin, cos, tan
    - exponential and logarithmic: exp, log, log10
- statistics
    - mean, std, percentile
- logical 
    - logical_and, logical_or, logical_not
- bitwise
    - bitwise_and, bitwisse_or, bitwise_xor
- comparison
    - greater, greater_equal, less, less_equal, equal, not_equal
- floating point routine
    - isinf, isnan, 
- array manipulation
    - floor, ceil, trunc
- others
    - where, choose
    
# can numpy ufuncs be used on pandas dataframe and series? 
- can. One diff is that pandas dataframe and series have index.
- numpy array is same as 1d pandas series
- convert num

# can numpy ufuncs be used on python list/dict and other built in data structure?
- yes, numpy ufuncs internally convert them into numpy array first, which takes more time
- numpy array is designed inherently to work seamlessly with python built data structure that is array-like, namely python list and set
- python dict --> use .values(), convert to list

In [14]:
# pd series to np array
import pandas as pd
a = pd.Series([1,2,3])
b = a.to_numpy()
b


array([1, 2, 3], dtype=int64)

In [18]:
# list to np array
a = [1,2,3]
b = np.array(a)
b

array([1, 2, 3])

In [2]:
# NumPy Array Vectorization:

import numpy as np

# Create two NumPy arrays
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])

# Perform a vectorized addition
c = a + b
print(c)  # Output: [ 6  8 10 12]


[ 6  8 10 12]


In [None]:
# Universal Functions (ufuncs) Vectorization:
import numpy as np

# Create a NumPy array
a = np.array([1, 2, 3, 4])

# Use a ufunc to compute the square of each element
b = np.square(a)
print(b)  # Output: [ 1  4  9 16]


In [10]:
b = [1,2,3,4]
np.equal(b, 2)

array([False,  True, False, False])

In [20]:
import pandas as pd
import numpy as np

# Creating a Pandas Series
series = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])

np.sum(series)

10

In [3]:
# Pandas Operations Vectorization:
import pandas as pd

# Create a pandas DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})

# Perform a vectorized operation to create a new column
df['C'] = df['A'] * df['B']
print(df)


   A  B   C
0  1  4   4
1  2  5  10
2  3  6  18


In [21]:
import numpy as np

# Example using np.where
x = np.array([1, 2, 3, 4])
y = np.array([10, 20, 30, 40])

# Condition
condition = np.array([True, False, True, False])

# np.where returns elements from `y` where `condition` is True, and from `x` otherwise
result = np.where(condition, y, x)
print(result)  # Output: [10  2 30  4]


[10  2 30  4]


In [22]:
# Example using np.choose
choices = np.array([[0, 1, 2, 3], [10, 11, 12, 13], [20, 21, 22, 23]])

# Index array
index_array = np.array([2, 1, 0, 1])

# np.choose uses `index_array` to choose elements from `choices`
result = np.choose(index_array, choices)
print(result)  # Output: [20 11  2 13]


[20 11  2 13]
