- Import numpy as np and pandas as pd

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer

Define method initialiseNetwork() initilise weights with zeros of shape(num_features, 1) and also bias b to zero
- parameters: num_features(number of input features)
- returns : dictionary of weight vector and bias

In [19]:
def initialiseNetwork(num_features):
  W = np.zeros((num_features, 1))
  b = 0
  parameters = {"W": W, "b": b}
  return parameters

define function sigmoid for the input z.  
- parameters: z
- returns: $1/(1+e^{(-z)})$

In [20]:
def sigmoid(z):
  a =  1/(1 + np.exp(-z))
  return a

Define method forwardPropagation() which implements forward propagtion defined as Z = (W.T dot_product X) + b, A = sigmoid(Z)
- parameters: X, parameters
- returns: A


In [21]:
def forwardPropagation(X, parameters):
  W = parameters["W"]
  b = parameters["b"]
  Z = np.dot(W.T,X) + b
  A = sigmoid(Z)
  return A

Define function cost() which calculate the cost given by  −(sum(Y\*log(A)+(1−Y)\*log(1−A)))/num_samples, here * is elementwise product
- parameters: A,Y,num_samples(number of samples)
- returns: cost

In [22]:
def cost(A, Y, num_samples):
  cost = -1/num_samples *np.sum(Y*np.log(A) + (1-Y)*(np.log(1-A)))
  return cost

Define method backPropgation() to get the derivatives of weigths and bias
- parameters: X,Y,A,num_samples
- returns: dW,db

In [23]:
def backPropagration(X, Y, A, num_samples):
  dZ =  A - Y                          
  dW =  (np.dot(X,dZ.T))/num_samples                        #(X dot_product dZ.T)/num_samples
  db =  np.sum(dZ)/num_samples                              #sum(dZ)/num_samples
  return dW, db
  

Define function updateParameters() to update current parameters with its derivatives  
w = w - learning_rate \* dw  
b = b - learning_rate \* db  
parameters: parameters,dW,db, learning_rate   
returns: dictionary of updated parameters

In [24]:
def updateParameters(parameters, dW, db, learning_rate):
  W = parameters["W"] - (learning_rate * dW)
  b = parameters["b"] - (learning_rate * db)
  return {"W": W, "b": b}
  

Define the model for forward propagation  
- parameters: X,Y, num_iter(number of iterations), learning_rate
- returns: parameters(dictionary of updated weights and bias)

In [25]:
def model(X, Y, num_iter, learning_rate):
  num_features = X.shape[0]
  num_samples = X.shape[1]
  parameters = initialiseNetwork(num_features)                            #call initialiseNetwork()
  for i in range(num_iter):
    A = forwardPropagation(X, parameters)                              # calculate final output A from forwardPropagation()
    if(i%100 == 0):
      print("cost after {} iteration: {}".format(i, cost(A, Y, num_samples)))
    dW, db =  backPropagration(X, Y, A, num_samples)              # calculate  derivatives from backpropagation
    parameters = updateParameters(parameters, dW, db, learning_rate)     # update parameters
  return parameters
    
    
  

- Run the below cell to define the function to predict the output.It takes updated parameters and input data as function parameters and returns the predicted output

In [26]:
def predict(W, b, X):
  Z = np.dot(W.T,X) + b
  Y = np.array([1 if y > 0.5 else 0 for y in sigmoid(Z[0])]).reshape(1,len(Z[0]))
  return Y

- The code in the below cell loads the breast cancer data set from sklearn.
- The input variable(X_cancer) is about the dimensions of tumor cell and targrt variable(y_cancer) classifies tumor as malignant(0) or benign(1)

In [27]:
(X_cancer, y_cancer) = load_breast_cancer(return_X_y = True)


- Split the data into train and test set using train_test_split(). Set the random state to 25. Refer the code snippet in topic 4

In [28]:
X_train, X_test, y_train, y_test = train_test_split(X_cancer, y_cancer,
                                                   random_state = 25)

Since the dimensions of tumor is not uniform you need to normalize the data before feeding to the network
- The below function is used to normalize the input data.

In [29]:
def normalize(data):
  col_max = np.max(data, axis = 0)
  col_min = np.min(data, axis = 0)
  return np.divide(data - col_min, col_max - col_min)

- Normalize X_train and X_test and assign it to X_train_n and X_test_n respectively

In [30]:
X_train_n = normalize(X_train)
X_test_n = normalize(X_test)

- Transpose X_train_n and X_test_n so that rows represents features and column represents the samples
- Reshape Y_train and y_test into row vector whose length is equal to number of samples.Use np.reshape()



In [31]:
X_trainT = X_train_n.T
X_testT = X_test_n.T
y_trainT = y_train.reshape(1,X_train_n.T.shape[1])
y_testT =  y_test.reshape(1,X_testT.shape[1])

Train the network using X_trainT,y_trainT with number of iterations 4000 and learning rate 0.75

In [32]:
parameters = model(X_trainT, y_trainT, 4000, 0.75)                #call the model() function with parametrs mentioned in the above cell

cost after 0 iteration: 0.6931471805599453
cost after 100 iteration: 0.24382767353051085
cost after 200 iteration: 0.18414919195134818
cost after 300 iteration: 0.1565873493485997
cost after 400 iteration: 0.1396752246321806
cost after 500 iteration: 0.1278729526958286
cost after 600 iteration: 0.11900887751136768
cost after 700 iteration: 0.1120266707270078
cost after 800 iteration: 0.10633924623930974
cost after 900 iteration: 0.10158933661241841
cost after 1000 iteration: 0.09754476494426205
cost after 1100 iteration: 0.0940469433647547
cost after 1200 iteration: 0.09098323338346236
cost after 1300 iteration: 0.08827107206470108
cost after 1400 iteration: 0.08584834873491792
cost after 1500 iteration: 0.0836673076013795
cost after 1600 iteration: 0.08169053991796828
cost after 1700 iteration: 0.07988826663984762
cost after 1800 iteration: 0.0782364464730404
cost after 1900 iteration: 0.07671542796224082
cost after 2000 iteration: 0.07530896965280097
cost after 2100 iteration: 0.0740

Predict the output of test and train data using X_trainT and X_testT using predict() method> Use the parametes returned from the trained model


In [33]:
yPredTrain = predict(parameters["W"], parameters["b"], X_trainT)   # pass weigths and bias from parameters dictionary and X_trainT as input to the function
yPredTest = predict(parameters["W"], parameters["b"], X_testT)    # pass the same parameters but X_testT as input data

**Run the below cell print the accuracy of model on train and test data and save your answers. Donot modify the cell.**

In [34]:
a1=round(100 - np.mean(np.abs(yPredTrain - y_train)) * 100,2)
a2=round(100 - np.mean(np.abs(yPredTest - y_test) * 100),2)
print("train accuracy: {} %".format(a1))
print("test accuracy: {} %".format(a2))

import hashlib
import pickle
def gethex(ovalue):
  hexresult=hashlib.md5(str(ovalue).encode())
  return hexresult.hexdigest()


def pickle_ans1(output):
  hexresult=gethex(output)
  with open('output/output1.pkl', 'wb') as file: 
    pickle.dump(hexresult,file)

def pickle_ans2(output):
  hexresult=gethex(output)
  with open('output/output2.pkl', 'wb') as file: 
    pickle.dump(hexresult,file)


pickle_ans1(a1)
pickle_ans2(a2)



train accuracy: 98.59 %
test accuracy: 93.01 %
