It is not possible to create a network of perceptrons using a linear activation function that can properly compute the XOR function. A linear activation function that takes a bias and a number of weighted inputs. If the activation function returns a number greater than (or less than, depending on setup) a certain threshold, then the perceptron activates. While this works for AND and OR, it does not work for XOR because AND and OR are linearly separable, but XOR is not. For AND and OR, any results above a certain value can be set aside as active or not. For example, in a two input AND perceptron with weight 1 on both inputs, the perceptron will only be active if both inputs are high (threshold = 2). Likewise, a two input OR perceptron will be active if the sum of weights times inputs is one or more, indicating that one or more inputs is high. However, there is no threshold that works for XOR because it needs to be inactive in the lowest sum case (both inputs = 0) and the highest sum case (both inputs = 1). Any threshold used for a linear activation function for XOR would include cases in which the perceptron should be active, and ones for which it should not be active. Consequently, perceptrons with linear activation functions cannot reliably compute XOR.
To create a network of perceptrons that can compute XOR, one of two things are needed. The first is a non-linear activation function, e.g. the ReLU function, the tanh function, or the sigmoid function. This allows the network to accommodate the non-linear nature of the XOR function. The second thing that the network needs is the ability to backpropagate. This means that it can derive changes from incorrect results and change the weights placed on various inputs as the model learns. For a two-input network, the resulting network will include two hidden nodes and one output node. One of the hidden nodes will move toward reflecting OR; the other will reflect AND. The output node should put a weight of 1 on the output of the OR node and a weight of -1 on the output of the AND node. Therefore, if both inputs are 0, the output node will receive an output of 0 from both the AND and OR nodes, and therefore output 0. If one output is 1, then the OR node will output 1 and the AND node will output 0, and the output node will output 1 (1 * 1 + -1 * 0).  If both inputs are 1, then both the OR and AND nodes will output 1, and the output node will return 0 (1 * 1 + -1 * 1).  Either of these courses of actions should allow a network of perceptrons to learn the XOR function.

In [4]:
# Question 2

# Part 1
import pandas as pd
import numpy as np

boston_housing_dataset = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv")
boston_housing_dataset.describe()

Unnamed: 0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,b,lstat,medv
count,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0
mean,3.613524,11.363636,11.136779,0.06917,0.554695,6.284634,68.574901,3.795043,9.549407,408.237154,18.455534,356.674032,12.653063,22.532806
std,8.601545,23.322453,6.860353,0.253994,0.115878,0.702617,28.148861,2.10571,8.707259,168.537116,2.164946,91.294864,7.141062,9.197104
min,0.00632,0.0,0.46,0.0,0.385,3.561,2.9,1.1296,1.0,187.0,12.6,0.32,1.73,5.0
25%,0.082045,0.0,5.19,0.0,0.449,5.8855,45.025,2.100175,4.0,279.0,17.4,375.3775,6.95,17.025
50%,0.25651,0.0,9.69,0.0,0.538,6.2085,77.5,3.20745,5.0,330.0,19.05,391.44,11.36,21.2
75%,3.677082,12.5,18.1,0.0,0.624,6.6235,94.075,5.188425,24.0,666.0,20.2,396.225,16.955,25.0
max,88.9762,100.0,27.74,1.0,0.871,8.78,100.0,12.1265,24.0,711.0,22.0,396.9,37.97,50.0


In [5]:
# Part 2
boston_housing_dataset.reindex(np.random.permutation(boston_housing_dataset.index))
test_set = boston_housing_dataset[:300]
validation_set  = boston_housing_dataset[300:400]
verification_set = boston_housing_dataset[400:]


In [None]:
# Part 3
boston_housing_dataset['school_investment'] = boston_housing_dataset['ptratio'] / boston_housing_dataset['tax']

'''
This synthetic feature relates the amount of investment that a given area
puts into education, represented here by the pupil-teacher ratio, to the available 
funds, represented by the tax rate per $10,000 of housing value. The chosen relationship
was found by dividing the pupil-teacher ratio by the tax rate.  The goal of this 
synthetic feature is examine the relationship between the amount a community invests
in its education system and the amount that the houses in the community are worth.
'''