## Python/Numpy Workshop

For this workshop you will be learning how to use the python library NumPy, which is heavily depended on when writing scripts for machine learning models, as it makes data processing intuitive.


In [13]:
# 1. Import the numpy package under the name np. You will need to do this step everytime you want to use NumPy!
# 2. Create a rank 1 array (1D) with values from 1 - 3. Call it Z, and print it. 
# 3. Change the first value of Z to a 5.
# 4. Sort Z, and then print it.
#########################
import numpy as np

Z = np.array([1, 2, 3])
print(Z)
Z[0] = 5
Z.sort()
print(Z)


[1 2 3]
[2 3 5]


In [14]:
# 1. create a null array of size 10 called Z.
# 2. Make the 5th value of Z equal to 1
# 3. Print Z
##############################

Z = np.zeros(10)
Z[4] = 1
print(Z)


[ 0.  0.  0.  0.  1.  0.  0.  0.  0.  0.]


In [16]:
# Print Z after every step below
# 1. Create a 2 x 2 array with all zeros. Call it Z.
# 2. Make Z = to a 2 x 2 array with all ones
# 3. Make Z = to a 2 x 2 array with all 4's
# 4. Make z = to a 2 x 2 identity matrix
##############################
Z = np.zeros((2,2))
print(Z)
Z = np.ones((2,2))
print(Z)
Z = np.full((2,2), 4)
print(Z)
Z = np.eye(2)
print(Z)

[[ 0.  0.]
 [ 0.  0.]]
[[ 1.  1.]
 [ 1.  1.]]
[[4 4]
 [4 4]]
[[ 1.  0.]
 [ 0.  1.]]


In [25]:
# 1. Create a rank 2 array (2D) with 3 coloumns that looks like [[0 1 2], [3, 4 ,5]]. Call it Z and print it.
# 2. Print the shape of Z.
# 3. Initialize an array X from values 0 - 5. Print X. Print the Shape of X
# 4. Transform X into a rank 2 array with 3 coloumns. Pritn X. Print the shape of X.
#########################

Z = np.array([[0,1,2], [3, 4, 5]])
print(Z)
print(Z.shape)

X = np.arange(6)
print(X)
print(X.shape)

X = np.arange(6).reshape(2,3)
print(X)
print(X.shape)


[[0 1 2]
 [3 4 5]]
(2, 3)
[0 1 2 3 4 5]
(6,)
[[0 1 2]
 [3 4 5]]
(2, 3)


In [8]:
# 1. Create a 5 x 5 matrix and initialize it with random numbers. Call it X.
# 2. Create a 3 x 3 matrix with values ranging from 0-8. Call it Y.
# 3. Print X and Y:
#########################
X = np.random.random((5,5))
Y = np.arange(9).reshape(3,3)
print(X)
print(Y)

[[ 0.10142452  0.48122312  0.45958066  0.56310389  0.60815282]
 [ 0.58479921  0.84664799  0.98498617  0.47684612  0.44329216]
 [ 0.23598511  0.04946631  0.57369088  0.87004661  0.85613233]
 [ 0.8709709   0.63800537  0.06996201  0.09877366  0.57589709]
 [ 0.91404372  0.94676893  0.21533946  0.49332761  0.77286546]]
[[0 1 2]
 [3 4 5]
 [6 7 8]]


In [36]:
# 1. Create Z, a rank one array with range 0 - 9.
# 2. Print out the first 5 values of Z
# 3. Print out the last 5 values of Z (bonus: use negative indexing)
# 4. Print out the middle 4 values of Z
###########################
Z = np.arange(10)
print(Z[:5])
print(Z[-5:])
print(Z[3:7])


[0 1 2 3 4]
[5 6 7 8 9]
[3 4 5 6]


In [42]:
# 1. Create the following array Z (with rank 4) with shape (4,3).
    # [[1 2 3] 
    #  [4 5 6]
    #  [7 8 9]
    #  [10 11 12]]
# 2. Print the shape of Z
# 3. Use slicing to create a subarray X which consists of the last 2 rows and last 2 coloumns of Z. Print X.
# 4. Change the value of Z at [4, 3] to -100 by modifying X. 
# 5. Print X
# 6. Print Z
##############################
Z = np.array([[1,2,3], [4,5,6], [7, 8, 9], [10, 11, 12]])
print(Z.shape)
X = Z[2: , 1:]
print(X)
X[1,1] = -100
print(X)
print(Z)


(4, 3)
[[ 8  9]
 [11 12]]
[[   8    9]
 [  11 -100]]
[[   1    2    3]
 [   4    5    6]
 [   7    8    9]
 [  10   11 -100]]


###### The following example is something similar to what you would see when setting up a linear regression model. The data is from the Boston Housing Data Set

In [48]:
#import the Boston data set:
from sklearn.datasets import load_boston
boston = load_boston()

print(boston.DESCR)

Boston House Prices dataset

Notes
------
Data Set Characteristics:  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive
    
    :Median Value (attribute 14) is usually the target

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio by town
      

In [56]:
# Investigate shape of the input data array
data = boston.data
target = boston.target

print(data.shape)
print(target.shape)

num_features = len(boston.feature_names)
num_samples = data.shape[0] # 506 training examples

# 1. Normalize the 'Data' as mean-centered, bounded by 1. This means for each feature in num features,
    # calculate the mean and the max of all of the data for that feature, and then subtract the mean and divide by the max for each data point
    # Hint, use the np.amax() function to find the max data point for a feature.
####################################

for i in range(num_features):
    feature_avg = np.mean(data[:, i])
    feature_max = np.amax(data[:, i])
    data[:, i] = (data[:, i]-feature_avg)/feature_max


(506, 13)
(506,)
