# Introduction to PyTorch - Simple Conditional Logic by way of NumPy

### NumPy and PyTorch ways to vectorize a loop and also handle simple conditional logic


One thing that could prevent us from effectively getting vector performance when converting a loop to a vector approach is when the original loop has if then else statements in it - called conditional logic

![SimpleLogic.png](Assets/SimpleLogic.png)

One thing that could prevent us from effectively getting vector performance when converting a loop to a vector approach is when the original loop has if then else statements in it - called conditional logic

The Numpy Where allows us  to tackle conditional loops in a fast vectorized way

Apply conditional logic to an array to create a new column orupdate contents of an existing column

**Syntax:**
- numpy.where(condition, [x, y, ]/)
- Return elements chosen from x or y depending on condition.

To understand what numpy where does, look at the simple example below
See a simple example below to add 50 to all elements currently greater than 5:



# Exercises:

Do a page search for each **Exercise** in this notebook. Complete all  exercises. Code in cells above each exercise may give insight into a solid approach

In [None]:
import torch
import numpy as np
from math import log10 as lg10
import time
import matplotlib.pyplot as plt
import random
import time
%matplotlib inline

In [None]:
a = torch.arange(10)
torch.where(a > 5, a + 50, a )
# if a > 5 then return a + 50
# else return a

This could come n handy for many AI applications, but let's choose labeling data

There may be better wyas to binarize data but here is a simple example of converting conrinuous data into  categorical values

arr = **np.array([11, 1.2, 12, 13, 14, 7.3, 5.4, 12.5])**

Let's say all values 10 and above represent a medical parameter threshold that indicates further testing, while values below 10 indicate normal range

We might like to print the values as words such as 
**['More Testing', 'Normal', 'More Testing', 'More Testing', ...]**



In [None]:
arr = np.array([11, 1.2, 12, 13, 14, 7.3, 5.4, 12.5])
np.where(arr < 10, 'Normal', 'More Testing')

# PyTorch tensors wants numbers!

The following cell will fail because we are passing strings in

In [None]:
arr = np.array([11, 1.2, 12, 13, 14, 7.3, 5.4, 12.5])
try:
    print(np.where(arr < 10, 'Normal', 'More Testing') )
except: 
    print("This where clause crashed due to using strings")

In [None]:
arr = torch.tensor([11, 1.2, 12, 13, 14, 7.3, 5.4, 12.5])
try:
    print(torch.where(arr < 10, 'Normal', 'More Testing') )
except: 
    print("This where clause crashed due to using strings")

In [None]:
arr = torch.tensor([11, 1.2, 12, 13, 14, 7.3, 5.4, 12.5])
torch.where(arr < 10, arr*2+1, arr*-1)

or we could binarize data for use in a classifier

In [None]:
# Simple Numpy Binarizer Discretizer
# convert continous data to discrete integer bins
arr = np.array([11, 1.2, 12, 13, 14, 7.3, 5.4, 12.5])
print(np.where(arr < 6, 0, np.where(arr < 12, 1, 2)))


In [None]:
# Simple Numpy Binarizer Discretizer
# convert continous data to discrete integer bins
arr = torch.tensor([11, 1.2, 12, 13, 14, 7.3, 5.4, 12.5])
print(torch.where(arr < 6, 0, torch.where(arr < 12, 1, 2)))

### Numpy Where to find rows and columns of conditions

Given a mask of TRUE/FALSE values, we will 
- generate  a new array with a 1 at every location TRUE is located
- generate a -1 at every location a FALSE is located

**Apply a mask**

In [None]:
#Apply a mask of True/False array to select or manipulate elements
a = np.ones((3,3))  # a contains all 1's

print("initial array a\n", a)
# Given a mask of true/ false values, we will generate 
# a new array with a 1 at every location TRUE is located
# a -1 at every location a FALSE is located
mask = [[False,True,True],[False,True,False],[True,False,True]]
print("\nmask")
for el in mask: # simple loop to print the mask
    print(el)
               
testing_array = np.where(mask,a,-a)

print("\ntesting_array\n",testing_array)

# now we can find where all the ones are by row and column
print("row index (where ones are): ",np.where(testing_array > 0)[0])
print("col index (where ones are): ",np.where(testing_array > 0)[1])

This can be used the other way to **create a mask** given a conidtion or threshold

In [None]:
# create a mask for indexing, to later manipulate arrays
a = np.ones((3,3))  # a contains all 1's

print("initial array a\n", a)
               
mask = [[False,True,True],[False,True,False],[True,False,True]]
print("\nmask")
for el in mask: # simple loop to print the mask
    print(el)
    
testing_array = np.where(mask,a,-a)

print("\ntesting_array\n",testing_array)

WentTheOtherWay = np.where(testing_array > 0,True, False)

print("\nWentTheOtherWay\n",WentTheOtherWay)

# now we can find where all the ones are by row and column
print("row index (where ones are): ",np.where(WentTheOtherWay > 0)[0])
print("col index (where ones are): ",np.where(WentTheOtherWay > 0)[1])

In [None]:
# create a mask for indexing, to later manipulate arrays
a = torch.ones((3,3))  # a contains all 1's

print("initial array a\n", a)
               
mask = torch.tensor([[False,True,True],[False,True,False],[True,False,True]])
print("\nmask")
for el in mask: # simple loop to print the mask
    print(el)
    
testing_array = torch.where(mask,a,-a)

print("\ntesting_array\n",testing_array)

WentTheOtherWay = np.where(testing_array > 0,True, False)

print("\nWentTheOtherWay\n",WentTheOtherWay)

# now we can find where all the ones are by row and column
print("row index (where ones are): ",np.where(WentTheOtherWay > 0)[0])
print("col index (where ones are): ",np.where(WentTheOtherWay > 0)[1])

## PyTorch/NumPy.where Multiplication Table Example

Find all locations of a value in a multiplication table

Find all(rows, cols) of the value 24 a multiplication table 

In [None]:
numLine = np.arange(1, 11).reshape(10,1)
MultiplicationTable = numLine * numLine.T
MultiplicationTable
np.where( MultiplicationTable == 24)

In [None]:
numLine = torch.arange(1, 11).reshape(10,1)
MultiplicationTable = numLine * numLine.T
MultiplicationTable
torch.where( MultiplicationTable == 24)

# Exercise:

Find all(rows, cols) of (all **multiples of **12 or all multiples of 9) in a 10x10 multiplication table and make all other values 0. Preserve the first row and first column as readable indexes for the table as follows:

# Python loopy approach

In [None]:
N = 10
timing = {}
A = np.arange(0, N)
A_torch = torch.arange(0, N)
#A_torch = torch.tensor(A)
t1 = time.time()
B = np.zeros((N,N))
B_torch = torch.zeros((N,N))
for i in range(N):
    row = []
    for j in range(N):
        B[i,j] = (i+1)*(j+1)
        if B[i,j] % 9 != 0 and B[i,j] %12 != 0:
            B[i,j] = 0
        row.append(int(B[i,j]))
    print(row)
t2 = time.time()
loop = t2-t1
timing['loop'] = loop
print("Elapsed ", t2-t1)


In [None]:
## one solution - preserves the indexing edges for easy checking

arr = np.array([1,2,3,4,5,6,7,8,9,10])
table = arr.reshape(10,1)*arr
np.where( (table % 9) == 0, table, np.where( (table % 12) == 0, table, 0))

In [None]:
## one solution - preserves the indexing edges for easy checking
arr = torch.tensor([1,2,3,4,5,6,7,8,9,10])
table = arr.reshape(10,1)*arr
torch.where( (table % 9) == 0, table, torch.where( (table % 12) == 0, table, 0))

## Numpy Where applied to California Housing data

In AI context, this could be applying categorical classifier to otherwise continuous values. For example, california hosugin dataset the target price varibale is continuous.

## Fictitious scenario

A new stimulous package is considered whereby new house buyers will be given a couon worth 50,000 off toward purchase of hosues in California whose price (prior to coupon) is between 250,0000 and 350,000. Other prices will be unaffected. Generate array with the adjusted targets


In [None]:
# Ficticious scenario:
from sklearn.datasets import fetch_california_housing

california_housing = fetch_california_housing(as_frame=True)
X = california_housing.data.to_numpy()
buyerPriceRangeLo = 250_000/100_000
buyerPriceRangeHi= 350_000/100_000
T = california_housing.target.to_numpy() 
t1 = time.time()
timing = {}
New = np.empty_like(T)
for i in range(len(T)):
    if ( (T[i] < buyerPriceRangeHi) & (T[i] >= buyerPriceRangeLo) ):
        New[i] = T[i] - 50_000/100_000
    else:
        New[i] = T[i]
t2 = time.time()
plt.title( "California Housing Dataset - conditional Logic Applied")
plt.scatter(T, New, color = 'b')
plt.grid()
print("time elapsed: ", t2-t1)
timing['loop'] = t2-t1

## Excercise:

Duplicate the above condition using a Numpy.Where 

In [None]:
t1 = time.time()
#############################################################################
### Exercise: Addone moddify code below to compute same results as above loop
#New = np.where(() & (), (), ()) 
New = np.where((T < buyerPriceRangeHi) & (T >= buyerPriceRangeLo), T - 50_000/100_000, T ) 

##############################################################################
t2 = time.time()

plt.scatter(T, New, color = 'r')
plt.grid()
print("time elapsed: ", t2-t1)
timing['np.where'] = t2-t1
print("Speedup: {:4.1f}X".format( timing['loop']/timing['np.where']))

In [None]:
X = torch.tensor(california_housing.data.to_numpy())
buyerPriceRangeLo = 250_000/100_000
buyerPriceRangeHi= 350_000/100_000
T = torch.tensor(california_housing.target.to_numpy() )

t1 = time.time()
#############################################################################
### Exercise: Addone moddify code below to compute same results as above loop
#New = np.where(() & (), (), ()) 
New = torch.where((T < buyerPriceRangeHi) & (T >= buyerPriceRangeLo), T - 50_000/100_000, T ) 

##############################################################################
t2 = time.time()

plt.scatter(T, New, color = 'r')
plt.grid()
print("time elapsed: ", t2-t1)
timing['torch.where'] = t2-t1
print("Speedup: {:4.1f}X".format( timing['loop']/timing['torch.where']))

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.figure(figsize=(10,6))
plt.title("Plot of various method of computing California Housing Discount Rebate!")
plt.ylabel("Time in seconds",fontsize=12)
plt.xlabel("Various types of operations",fontsize=14)
plt.xticks(rotation=-60)
plt.grid(True)
plt.bar(x = range(len(timing)), height=list(timing.values()), align='center', tick_label=list(timing.keys()))
print('Acceleration : {:4.0f} X'.format(timing['loop']/timing['torch.where']))

As you can see, we generated the same data with Numpy where as we did woth the original loop but we did so 13X faster (the speedup amount may vary a bit)

# Numpy Select to handle conditional logic

![ConditionalLogic.png](Assets/ConditionalLogic.png)
![SimpleLogic.png](Assets/SimpleLogic.png)

Apply conditional logic to an array to create a new column orupdate contents of an existing column. This method handles more complex conditional sceanrios than numpy where.

**Syntax:**
- [numpy.select(condlist, choicelist, default=0)[source]
- Return an array drawn from elements in choicelist, depending on conditions.

function return an array drawn from elements in choicelist, depending on conditions.

This is very useful function for handing conditionals that otherwise slow down and map or apply, or else add complexity in reading the code

First we will create some new data


In [None]:
import time

BIG = 10_000_000

np.random.seed(2022)
A = np.random.randint(0, 11, size=(BIG, 6))
print(A[:5])

In [None]:
import time

BIG = 10_000_000

torch.manual_seed(2022)  # different seed than NumPy

A_torch =torch.randint(0, 11, size=(BIG, 6))
print(A_torch[:5])

Find Large loop iteration loop2

If they contain conditional logic:
- consider np.where or np.select

else
- Try to find a Numpy replacement using ufuncs, aggergations, etc

Below is a loop consuming 100,000 iterations, with a messy set of conditions

Look for a way to summarize these conditions using a numpy select statement if possible

# Brute Force Approach (Big Loop)

In [None]:
# NumPy approach
timing = {}
t1 = time.time()
for i in range(BIG):
    if A[i,4] == 10:
        A[i,5] =  A[i,2] * A[i,3]
    elif (A[i,4] < 10) and (A[i,4] >=5):
        A[i,5] =   A[i,2] + A[i,3]
    elif A[i,4] < 5:
        A[i,5] =   A[i,0] + A[i,1]
t2 = time.time()
baseTime = t2- t1
print(A[:5,:])
print("time: ", baseTime)
timing['Naive Loop'] = t2 - t1

# Try Vectorizing with masks 

Just remove the references to i and remove the loop, create mask for each condition


In [None]:
# Try Vectorizing simply NumPy
t1 = time.time()
mask1 = A[:,4] == 10
A[mask1,5] =  A[mask1,2] * A[mask1,3]
mask2 = ((A[:,4].any() < 10) and (A[:,4].any() >=5))
A[mask2,5] =   A[mask2,2] + A[mask2,3]
mask3 = A[:,4].any() < 5
A[mask3,5] =   A[mask3,0] + A[mask3,1]
t2 = time.time()
print(A[:5,:])
print("time :", t2-t1)

fastest_time = t2-t1
Speedup = baseTime / fastest_time
print("Speed up: {:4.0f} X".format(Speedup))
timing['Vector Masks NumPy'] = t2 - t1

In [None]:
# Try Vectorizing simply PyTorch
t1 = time.time()
A_torch = torch.tensor(A)
mask1 = A_torch[:,4] == 10
A_torch[mask1,5] =  A_torch[mask1,2] * A_torch[mask1,3]
mask2 = ((A_torch[:,4].any() < 10) and (A_torch[:,4].any() >=5))
A_torch[mask2,5] =   A_torch[mask2,2] + A_torch[mask2,3]
mask3 = A_torch[:,4].any() < 5
A_torch[mask3,5] =   A_torch[mask3,0] + A_torch[mask3,1]
t2 = time.time()
print(A_torch[:5,:])
print("time :", t2-t1)

fastest_time = t2-t1
Speedup = baseTime / fastest_time
print("Speed up: {:4.0f} X".format(Speedup))
timing['Vector Masks PyTorch'] = t2 - t1

# Try Vectorizing with select

### Much cleaner logic

put condition inside a list of tuples
put execution choice inside a list of tuples
result = np.select(condition, choice, default)

PyTorch does not implement Select - but you can write one yourself

In [None]:
from functools import reduce
def my_select1(c, v, d =0):
    _c, _v = c.pop(), v.pop()
    r = select(c, v, d) if len(c) else d
    return torch.where(_c, _v, r)

def my_select2(c, v, d=0):
    zipped = reversed(list(zip(c, v)))
    return reduce(lambda o, a: torch.where(*a, o), zipped, d)

In [None]:
# np.select(condlist, choicelist, default=0)
t1 = time.time()

condition = [ (A[:,4]  < 10) & (A[:,4] >= 5),
              ( A[:,4] < 5)]
choice = [ (A[:,2] + A[:,3]), 
           (A[:,0] + A[:,1] ) ]
default = [(A[:,2] * A[:,3])]
A[:,5] = np.select(condition, choice, default= default )

t2 = time.time()
print(A[:5,:])
print("time :", t2-t1)
fastest_time = t2-t1
Speedup = baseTime / fastest_time
print("Speed up: {:4.0f} X".format(Speedup))
timing['Numpy Select'] = t2 - t1

In [None]:
# np.select(condlist, choicelist, default=0)
t1 = time.time()

condition =  [ (A_torch[:,4]  < 10) & (A_torch[:,4] >= 5),
              ( A_torch[:,4] < 5)] 

choice =  [ (A_torch[:,2] + A_torch[:,3]), 
           (A_torch[:,0] + A_torch[:,1] ) ] 

A_torch[:,5] = my_select2(condition, choice, d = (A_torch[:,2] * A_torch[:,3]))

t2 = time.time()
print(A_torch[:5,:])
print("time :", t2-t1)
fastest_time = t2-t1
Speedup = baseTime / fastest_time
print("Speed up: {:4.0f} X".format(Speedup))
timing['PyTorchCustom Select'] = t2 - t1

In [None]:
plt.figure(figsize=(10,6))
plt.title("Time taken to process {:,} records in seconds".format(BIG),fontsize=12)
plt.ylabel("Time in seconds",fontsize=12)
plt.xlabel("Various types of operations",fontsize=14)
plt.grid(True)
plt.xticks(rotation=-60)
plt.bar(x = list(timing.keys()), height= list(timing.values()), align='center',tick_label=list(timing.keys()))

## Exercise: Numpy Select

Find all(rows, cols) of (all multiples of 12, 15, 21) in a multiplication table and make all other values 0 using numpy select

In [None]:
# numpy.select(condlist, choicelist, default)
numLine = np.arange(1, 11).reshape(10,1)
multT = numLine * numLine.T

# condition = [(), (), ()]
# choice = [(), (), ()]
# default =[()]
# res = np.select(condition, choice, default)

# res[0,:] = MultiplicationTable[0,:]  # put edges back in to check result
# res[:,0] = MultiplicationTable[:,0]  # put edges back in to check result
# res

In [None]:
# numpy approach

numLine = np.arange(1, 11).reshape(10,1)
                                                                                                                                                     
res = np.select(condition, choice, default)
# res[0,:] = MultiplicationTable[0,:]  # put edges back in to check result
# res[:,0] = MultiplicationTable[:,0]  # put edges back in to check result
res

In [None]:
# PyTorch tensr approach
numLine = torch.arange(1, 11).reshape(10,1)
multT = numLine * numLine.T

condition = [(multT%12 == 0), (multT%15 == 0), (multT%21 == 0)]
choice = [(multT), (multT), (multT)]
default =[(0)]

res = np.select(condition, choice, default)
# res[0,:] = MultiplicationTable[0,:]  # put edges back in to check result
# res[:,0] = MultiplicationTable[:,0]  # put edges back in to check result
res

# List of days

### AKA showcasing fancy slicing

In machine learning, feature engineering, particularly for data containnig dates and times need special preprocessing.

It is fairly common to create new columns from datetime data to explicitly call out Day of Week (DOW), Day of Year (DOY) Day of Month (DOM), Quarter, hour of day, minutes and seconds of day and so on.

This is because some cyclical patterns or special handling may have to occur to ahndle exceptions. For exmaple reporting revenue for weekdays as opposed to weekends.

In the example below, we will assum Saturday starts on day 3 and we want to efficiently report out each weekend day of the month

Goal: grab subset of data for weekend days into a numpy array

Demonstrate approach using slicing as wellas np.where()

In [None]:
import numpy as np
# create simple array of data with days info
a = np.array([i for i in range(21)])
a_torch = torch.tensor([i for i in range(21)])
a,a_torch

In [None]:
# skip count by 7 starting on day 0
print("Here are the days of the month that are called Saturday")
a[3::7], a_torch[3::7]

In [None]:
# skip count by 7 starting on day 1
print("Here are the days of the month that are called Sunday")
a[4::7], a_torch[4::7]

# Here is list of all the weekend days

In [None]:
# indices for weekend days
start = 3 # say Saturday of interest starts on the day 3 of the month
blist = list(zip(a[start::7],a[start+1::7]))
blist_torch = list(zip(a_torch[start::7].numpy(),a_torch[start+1::7].numpy()))
blist, blist_torch

In [None]:
np.array(blist).flatten()

In [None]:
idx = np.where((a%7==3) | (a%7==4))
a[idx]

In [None]:
idx = np.where((a_torch%7==3) | (a_torch%7==4))
a_torch[idx].numpy()

In [None]:
print("Done")