<a href="https://colab.research.google.com/github/Lawrence-Krukrubo/Predicting_California_Housing_Prices/blob/master/part_1_predicting_california_housing_prices.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

We shall hard-code a Neural Network using python and use it to predict the price of houses in california. <br> Each hidden layer and output layer will have the Relu activation function applied.

See this [link for reference](https://hackernoon.com/build-your-first-neural-network-to-predict-house-prices-with-keras-3fb0839680f4)

The [Universal Function Approximation Theorem](https://en.wikipedia.org/wiki/Universal_approximation_theorem) states that a neural network with a single hidden layer and a finite number of neurons can approximate continous functions on compact subsets of data points in hyper-dimensional vector spaces, under mild assumptions of the activation function.

## **PART 1: The Needed Modules:**

In [132]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn as sk
import sklearn.linear_model
import scipy
from PIL import Image
from scipy import ndimage
print('all modules imported!')

all modules imported!


## **PART 2: The Data and Preprocessing:**

In [133]:
california_train = pd.read_csv('sample_data/california_housing_train.csv')
california_test = pd.read_csv('sample_data/california_housing_test.csv')
print(f'Shape of training data is: {california_train.shape},\nShape of testing data is: {california_test.shape}')

Shape of training data is: (17000, 9),
Shape of testing data is: (3000, 9)


<h4><b>The Data Dictionary:</b></h4>

1. **longitude:** <br>A measure of how far west a house is; a higher value is farther west

2. **latitude:** <br>A measure of how far north a house is; a higher value is farther north

3. **housingMedianAge:** <br>Median age of a house within a block; a lower number is a newer building

4. **totalRooms:** <br>Total number of rooms within a block

5. **totalBedrooms:** <br>Total number of bedrooms within a block

6. **population:** <br>Total number of people residing within a block

7. **households:** <br>Total number of households, a group of people residing within a home unit, for a block

8. **medianIncome:** <br>Median income for households within a block of houses (measured in tens of thousands of US Dollars)

9. **medianHouseValue:** <br>Median house value for households within a block (measured in US Dollars)


Let's see the heads of the training and testing sets.

In [134]:
california_train.head()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-114.31,34.19,15.0,5612.0,1283.0,1015.0,472.0,1.4936,66900.0
1,-114.47,34.4,19.0,7650.0,1901.0,1129.0,463.0,1.82,80100.0
2,-114.56,33.69,17.0,720.0,174.0,333.0,117.0,1.6509,85700.0
3,-114.57,33.64,14.0,1501.0,337.0,515.0,226.0,3.1917,73400.0
4,-114.57,33.57,20.0,1454.0,326.0,624.0,262.0,1.925,65500.0


In [135]:
california_test.head()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-122.05,37.37,27.0,3885.0,661.0,1537.0,606.0,6.6085,344700.0
1,-118.3,34.26,43.0,1510.0,310.0,809.0,277.0,3.599,176500.0
2,-117.81,33.78,27.0,3589.0,507.0,1484.0,495.0,5.7934,270500.0
3,-118.36,33.82,28.0,67.0,15.0,49.0,11.0,6.1359,330000.0
4,-119.67,36.33,19.0,1241.0,244.0,850.0,237.0,2.9375,81700.0


Let's inspect the training and testing data to ensure no missing values and each feature has the right data type.

In [136]:
california_train.isna().sum()

longitude             0
latitude              0
housing_median_age    0
total_rooms           0
total_bedrooms        0
population            0
households            0
median_income         0
median_house_value    0
dtype: int64

No missing values in the training data set, let's confirm it has the right data types per feature

In [137]:
california_train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17000 entries, 0 to 16999
Data columns (total 9 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   longitude           17000 non-null  float64
 1   latitude            17000 non-null  float64
 2   housing_median_age  17000 non-null  float64
 3   total_rooms         17000 non-null  float64
 4   total_bedrooms      17000 non-null  float64
 5   population          17000 non-null  float64
 6   households          17000 non-null  float64
 7   median_income       17000 non-null  float64
 8   median_house_value  17000 non-null  float64
dtypes: float64(9)
memory usage: 1.2 MB


Asesome! all data types have the right values. Let's do so for the Test data

In [138]:
california_test.isna().sum()

longitude             0
latitude              0
housing_median_age    0
total_rooms           0
total_bedrooms        0
population            0
households            0
median_income         0
median_house_value    0
dtype: int64

In [139]:
california_test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3000 entries, 0 to 2999
Data columns (total 9 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   longitude           3000 non-null   float64
 1   latitude            3000 non-null   float64
 2   housing_median_age  3000 non-null   float64
 3   total_rooms         3000 non-null   float64
 4   total_bedrooms      3000 non-null   float64
 5   population          3000 non-null   float64
 6   households          3000 non-null   float64
 7   median_income       3000 non-null   float64
 8   median_house_value  3000 non-null   float64
dtypes: float64(9)
memory usage: 211.1 KB


<h4><b>2.1: Splitting and Re-shaping the data:</b></h4> 

So let's split the training and testing sets into sub train and test sets

In [140]:
# First let's make copies of the training and testing sets as numpy arrays
train_arr = california_train.values
test_arr = california_test.values

# Next, let's create the features and labels for both training and testing sets.
x_train, y_train = train_arr[:,:-1], train_arr[:,-1]
x_test, y_test = test_arr[:,:-1], test_arr[:,-1]

# Let's print the shapes of the training and testing labels
print(f'x_train shape is:- {x_train.shape} and y_train shape is {y_train.shape}.')
print(f'x_test shape is:- {x_test.shape} and y_test shape is {y_test.shape}.')

x_train shape is:- (17000, 8) and y_train shape is (17000,).
x_test shape is:- (3000, 8) and y_test shape is (3000,).


Next, let's reshape the training and testing sets to become a transpose of the current shape, but making sure we don't have rank-1 arrays in the process

In [141]:
x_train = x_train.reshape(x_train.shape[0], -1).T
y_train = y_train.reshape(y_train.shape[0], -1).T
x_test = x_test.reshape(x_test.shape[0], -1).T
y_test = y_test.reshape(y_test.shape[0], -1).T

# Let's print out the shapes again
print(f'x_train shape is:- {x_train.shape} and y_train shape is {y_train.shape}.')
print(f'x_test shape is:- {x_test.shape} and y_test shape is {y_test.shape}.')

x_train shape is:- (8, 17000) and y_train shape is (1, 17000).
x_test shape is:- (8, 3000) and y_test shape is (1, 3000).




<h4><b>2.2: Feature Normalization:</b></h4>

Let's normalize the training sets. Let's use the Z-Score or standard score normalization. Let's define a Z_score method


In [142]:
def Z_score(x):
    """Compute z_score of a distribution.

    @param:
    x is an array or dataframe of ints or floats

    @Return:
    Returns z_score normalisation applied to x
    """
    mean = np.mean(x)
    std = np.std(x)
    zee_score = (x - mean) / std
    
    return zee_score

Now let's apply the z_score normalisation to the training sets

In [143]:
x_train_norm = np.apply_along_axis(Z_score, 1, x_train)
x_test_norm = np.apply_along_axis(Z_score, 1, x_test)

# Let's confirm they still have the same shape
print(x_train_norm.shape == x_train.shape)
print(x_test_norm.shape == x_test.shape)

True
True


In [144]:
# Let's see the first few elements of the x_train_norm array
x_train_norm[:5]

array([[ 2.619365  ,  2.53956878,  2.4946834 , ..., -2.36291168,
        -2.36291168, -2.387848  ],
       [-0.67152023, -0.57326437, -0.90546278, ...,  2.90780067,
         2.88908527,  2.29955006],
       [-1.07967114, -0.76187201, -0.92077158, ..., -0.92077158,
        -0.76187201,  1.85997083],
       [ 1.36169494,  2.29660752, -0.88246225, ...,  0.01529238,
         0.01299867, -0.377848  ],
       [ 1.76420407,  3.23044127, -0.86695622, ..., -0.01995512,
         0.02986848, -0.56801465]])

## **PART 3: Intro to Building a Neural Network From Scratch:**

<h4><b>Building a Logistic Regression model as a Neural Network</b></h4>

**Logistic Regression with a Neural Network mindset:**

I will build a logistic regression classifier as a Neural Network to predict housing prices

Steps Include:

1. Do not use loops (for/while) unless absolutely necessary
2. Build the general architecture of a learning algorithm, including:
Initializing parameters
3. Calculate the cost function and its gradient
4. Use an optimization algorithm (gradient descent)
5. Gather all three functions above into a main model function, in the right order.


<h4><b>Mathematical expression of the algorithm:</b></h4>

For one example $x^{(i)}$:$$z^{(i)} = w^T x^{(i)} + b $$$$\hat{y}^{(i)} = a^{(i)} = sigmoid(z^{(i)})$$$$ \mathcal{L}(a^{(i)}, y^{(i)}) =  - y^{(i)}  \log(a^{(i)}) - (1-y^{(i)} )  \log(1-a^{(i)})$$

The cost is then computed by summing over all training examples:$$ J = \frac{1}{m} \sum_{i=1}^m \mathcal{L}(a^{(i)}, y^{(i)})$$

Key steps: In this exercise, we will carry out the following steps:

- Initialize the parameters of the model
- Learn the parameters for the model by minimizing the cost  
- Use the learned parameters to make predictions (on the test set)
- Analyse the results and conclude

<h2><b>Part 4: Building the parts of our algorithm</b></h2>

The main steps for building a Neural Network are:

1. Define the model structure (such as number of input features)
2. Initialize the model's parameters
3. Loop:
>>1. Calculate current loss (forward propagation)
>>2. Calculate current gradient (backward propagation)
>>3. Update parameters (gradient descent)

I will build 1-3 separately and integrate them into one function called a  model().

<h4><b>4.1: Helper Function:</b></h4> 

Let's create the sigmoid function we shall apply to the linear function of each feature

In [145]:
def sigmoid(z):
    """
    Compute the sigmoid of z

    Arguments:
    z -- A scalar or numpy array of any size.

    Return:
    s -- sigmoid(z)
    """

    s = np.divide(1, 1 + np.exp(-z))

    return s

In [146]:
print ("sigmoid([0, 2]) = " + str(sigmoid(np.array([0,2]))))

sigmoid([0, 2]) = [0.5        0.88079708]


<h4><b>4.2: Initializing Parameters:</b></h4> 

We initialise the weights and bias parameters. The weights should take the shape of `(num_features, 1)`, while bias should be initialised to `0`. 

In [147]:
def initialise_params(x):
    """
    This function creates a vector of zeros of shape (x.shape[0], 1) for w and initializes b to 0.
    
    Argument:
    x -- an array of features
    
    Returns:
    w -- initialized vector of shape np.zeros((x.shape[0], 1))
    b -- initialized scalar (corresponds to the bias)
    """
    dim = x.shape[0]

    w = np.zeros((dim,1))
    b = 0

    # Let's run some assertions on the shape of w and type of b.
    assert(w.shape == (dim, 1))
    assert(isinstance(b, float) or isinstance(b, int))
    
    return w, b

Let's test the initialise params function.
We shall create a random array of shape (5, 3), then apply the initialise_params function to it. We should get w of zeros of shape (5,1) and b of 0.

In [148]:
t = np.random.rand(5,3)
t

array([[0.21385435, 0.64662146, 0.53206723],
       [0.91203602, 0.71840279, 0.48113843],
       [0.92378689, 0.36428454, 0.81416619],
       [0.48323346, 0.79476478, 0.69067109],
       [0.55703255, 0.33709274, 0.72617963]])

In [149]:
w = initialise_params(t)[0]
b = initialise_params(t)[1]

# Let's see w and b
print(f'w =\n{w}\n\nb =\n{b}')

w =
[[0.]
 [0.]
 [0.]
 [0.]
 [0.]]

b =
0


Let's confirm that t.shape[0] == w.shape[0]

In [150]:
t.shape[0] == w.shape[0]

True

In [151]:
assert w.shape == (5,1)
print('Yes! w.shape == (5,1)')

Yes! w.shape == (5,1)


<h4><b>4.3: Forward and Backward Propagation:</b></h4> 

Now that the parameters are initialized, I can do the "forward" and "backward" propagation steps for learning the parameters.

I need to Implement a function propagate() that computes the cost function and its gradient.

Cues:

**Forward Propagation:**

* I get X
* I compute $A = \sigma(w^T X + b) = (a^{(1)}, a^{(2)}, ..., a^{(m-1)}, a^{(m)})$
* I calculate the cost function: $J = -\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log(a^{(i)})+(1-y^{(i)})\log(1-a^{(i)})$<br>
Here are the two formulas I will be using:

$$ \frac{\partial J}{\partial w} = \frac{1}{m}X(A-Y)^T$$$$ \frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (a^{(i)}-y^{(i)})$$