In [1]:
# Go over quiz
# Unit test example
# Broadcasting example 

In [2]:
import numpy as np

a=np.array([1,2,3])
b=np.array([4,5])

In [8]:
#This produces an error
#a+b

Indeed, adding arrays of different sizes feels "unnatural". But note that the error message said 

"ValueError: operands could not be broadcast together with shapes (3,) (2,)"

It didn't just say "shapes don't match."

Sometimes you **can** add arrays of different shapes and this is called broadcasting


In [17]:
a=np.ones(3)
b=np.zeros(3)
b=b.reshape(1,3)
a,b


(array([1., 1., 1.]), array([[0., 0., 0.]]))

In [18]:
#It looks like we should be able to add a and b,but 
a.shape,b.shape

((3,), (1, 3))

First rule of broadcasting:
If one array has more dimensions, take the array with fewer dimensions  and 'pad' it with ones __ON THE LEFT__

If you try to type a+b, numpy will automatically convert it to an array with shape (1,3) and then the addition works fine.


In [19]:
a+b

array([[1., 1., 1.]])

In [21]:
#Example 2
a=np.zeros((3,3))
b=np.arange(3)
print(a.shape,b.shape)
a,b

(3, 3) (3,)


(array([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]]),
 array([0, 1, 2]))

What happens if we try to add a and b?

By the first rule of broadcasting, b gets padded with a one so now a has shape(3,3) and b has shape (1,3)

### Second rule of broadcasting. 
If arrays have the same number of dimesions, you can stretch out ones

this means that b will get converted from array([0,1,2]) to array([[0,1,2],[0,1,2],[0,1,2]])
is the 3 by 3 array where every row is a copy of the original b

mathematically, we have the formula new_b[i,j]=old_b[j]

axis 0 is the axis we are "streching over" i isn't on the right hand side

In [22]:
a+b

array([[0., 1., 2.],
       [0., 1., 2.],
       [0., 1., 2.]])

### Third rule of broadcasting.

Sometimes you are out of luck. If the arrays have the same number of dimensions, the dimensions don't match, and there is not a one in either location, you can't add, i.e., you can  only stretch ones. 

In [25]:
#This gives you a value error
#A=np.zeros((3,4))
#B=np.zeros((4,3))
#A+B

### More Complicated Example

In [34]:
a=np.arange(3)
a=a.reshape(1,3)
b=2*np.arange(3)
b=b.reshape(3,1)
print(a.shape,b.shape)
a,b

(1, 3) (3, 1)


(array([[0, 1, 2]]),
 array([[0],
        [2],
        [4]]))

Can we add? Yes! As long as there is a one in each dimension!

a gets stretched out along dimension 0 to form a 3 by 3 array. We are stretching along axis = so the rule is 
a_new[i,j]=a[j] so a becomes

array([[0,1,2],[0,1,2],[0,1,2]])

b on the other hand gets stretched along axis 1 so it gets obays the rule b_new[i] = b[i,j]

so be gets converted from 0,2,4 to 

array([[0,0,0],[2,2,2],[4,4,4]])

Now, since  a and b are both 3 by 3 we can add

In [37]:
a+b

array([[0, 1, 2],
       [2, 3, 4],
       [4, 5, 6]])

Note that a_new and b_new are all "under the hood." You don't actually see them.

In [38]:
#Example
np.arange(3)+4

array([4, 5, 6])

4 is stretched to array([4,4,4])

# Normalizing Data

In [43]:
#Here is some random data
X=np.random.rand(10,3)
X

array([[0.61440315, 0.68466783, 0.64857397],
       [0.20475235, 0.09501762, 0.80840835],
       [0.32957455, 0.33889614, 0.74453961],
       [0.2760332 , 0.51612241, 0.87956672],
       [0.69628698, 0.62350945, 0.37273784],
       [0.79106603, 0.9561744 , 0.15052061],
       [0.44403495, 0.59100687, 0.82818   ],
       [0.29162274, 0.37630425, 0.63401667],
       [0.70889723, 0.01497621, 0.31419256],
       [0.87173926, 0.09397125, 0.29954175]])

Interpretation. Each row is a new person. Each column is a different 'assessment'. Can we manipulate our data so that each column has mean zero and variance 1?

In [48]:
#mean of each column
mu=X.mean(axis=0)
mu

array([0.52284104, 0.42906464, 0.56802781])

What happen if we type X-mu? 

X has shape 10,3 mu has shape 3, so by the first rule of broadcasting, mu gets padded with a one to have shape 1,3.

Then, by the second rule of broadcasting, mu gets stretched out to a ten by 3 matrix where each row is a copy of the original mu

In [51]:
Centered = X-mu
Centered

array([[ 0.09156211,  0.25560319,  0.08054616],
       [-0.31808869, -0.33404702,  0.24038054],
       [-0.19326649, -0.0901685 ,  0.17651181],
       [-0.24680785,  0.08705776,  0.31153892],
       [ 0.17344594,  0.1944448 , -0.19528997],
       [ 0.26822499,  0.52710975, -0.4175072 ],
       [-0.0788061 ,  0.16194222,  0.26015219],
       [-0.2312183 , -0.05276039,  0.06598886],
       [ 0.18605618, -0.41408843, -0.25383525],
       [ 0.34889821, -0.33509339, -0.26848606]])

In [52]:
Centered.mean(axis=0)

array([-2.22044605e-17,  3.33066907e-17, -5.55111512e-17])

Okay, now lets make the variance of each column =1

In [53]:
sigma = X.std(axis=0)
sigma

array([0.22932017, 0.28653953, 0.24768622])

In [55]:
Normalized=(X-mu)/sigma
Normalized

array([[ 0.39927629,  0.89203465,  0.32519437],
       [-1.38709427, -1.16579733,  0.97050429],
       [-0.84278017, -0.31468085,  0.71264283],
       [-1.07625879,  0.30382462,  1.25779674],
       [ 0.75634837,  0.67859678, -0.78845714],
       [ 1.16965285,  1.83957078, -1.68562952],
       [-0.34365097,  0.56516537,  1.05032971],
       [-1.00827723, -0.18412954,  0.26642119],
       [ 0.81133808, -1.44513543, -1.02482588],
       [ 1.52144583, -1.16944906, -1.08397658]])

In [57]:
Normalized.mean(axis=0), Normalized.std(axis=0)

(array([-2.22044605e-17,  1.33226763e-16, -2.44249065e-16]),
 array([1., 1., 1.]))

In [59]:
#Note that the rows do not have mean zero or variance 1

Normalized.mean(axis=1), Normalized.std(axis=1)


(array([ 0.5388351 , -0.52746243, -0.14827273,  0.16178752,  0.215496  ,
         0.44119804,  0.42394804, -0.30866186, -0.55287441, -0.24399327]),
 array([0.25157434, 1.0630683 , 0.64580901, 0.95815263, 0.71061136,
        1.52856013, 0.57778444, 0.52779114, 0.97978631, 1.24884155]))

# Unittesting

Lets first build a sqrt function


In [12]:
def sqrt(x):
    """
    custom sqrt function with error messages for invalid input
    """
    if type(x) not in [int,float]:
        raise TypeError("This function only works with numerical data types")
    elif x<0:
        raise ValueError("x must be nonnegative")
    else:
        return x*(1/2)  #small error missing parenthesis

Now let's do some unittesting

In [18]:
import unittest #needed for unittesting

#unit testers will always inherit from unittest.TestCase
class TestSqrt(unittest.TestCase):
    
    
    def test_type_error(self):
        #string should raise a TypeError
        s="Aardvarks love almonds"
        with self.assertRaises(TypeError):
            sqrt(s)
            
    def test_standard(self):
        self.assertEqual(sqrt(9),3)
        
    def test_negative(self):
        with self.assertRaises(ValueError):
            sqrt(-1)
    
    

In [19]:
tester=TestSqrt()
tester.test_type_error()

In [20]:
tester.test_standard()

AssertionError: 4.5 != 3

In [21]:
tester.test_negative()

In [22]:
sqrt(3)

1.5