# Energy Efficiency Prediction

## Introduction and objectives 

This dataset is obtained from UCI machine learning repository

https://archive.ics.uci.edu/ml/datasets/Energy+efficiency 

There are serveral other online repositories to get datasets for machine learning.
Check for example, https://machinelearningmastery.com/a-guide-to-getting-datasets-for-machine-learning-in-python/ 

The objective is to predict the heating and cooling load requirements (i.e. energy efficiency) given the building characteristics.

In the dataset, there are 768 samples describing different building shapes with eight  characteristics:

    X1 Relative Compactness
    X2 Surface Area
    X3 Wall Area
    X4 Roof Area
    X5 Overall Height
    X6 Orientation
    X7 Glazing Area
    X8 Glazing Area Distribution
    
The corresponding outputs/labels are given as: 
    y1 Heating Load
    y2 Cooling Load


## Question 1: Elementary question               



### 1. Import necessary python libraries and setup the notebook

In [None]:
%matplotlib inline
import numpy as np
import pandas as pd 
import matplotlib
import matplotlib.pyplot as plt

# Routines for linear regression
from sklearn import linear_model
from sklearn.metrics import mean_squared_error

# Required for splitting the dataset 
from sklearn.model_selection import train_test_split

# Set label size for plots
matplotlib.rc('xtick', labelsize=14) 
matplotlib.rc('ytick', labelsize=14)


### 2. Load and split the dataset

The next code segment reads the ENB2012 dataset as a pandas data frame using pandas.read_csv 
More information with examples can be found at https://machinelearningmastery.com/massaging-data-using-pandas/ 

Also keep in mind that there are several other ways to read data in Python, e.g. using numpy.loadtext, numpy.read_csv, numpy.loadtxt, numpy.genfromtxt(), pandas.read_csv(), etc. Read more at https://machinelearningmastery.com/load-machine-learning-data-python/ 

Note: splitting is done randomly and hence every time you ran the program, you obtain different results.
To ensure that you always get the same results, set the seed of the random generator. 


In [None]:
columns = ['X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7', 'X8', 'Y1', 'Y2']
df = pd.read_csv('ENB2012-data.csv', names = columns, skiprows = 1)
train, test = train_test_split(df, test_size=0.2, random_state=42)
print(df.head)

# 3. Explore your training dataset 

## 3.1 Descriptive statistics 

Get some statistical summary of the dataset using `pandas.describe` and the correlation between each pair of variables using `pandas.corr`.   

In [None]:
# Print statistical summary of the train dataset

# YOUR CODE GOES HERE
print(train.describe()
     )


# END of YOUR CODE

In [None]:
# Print correlation of variables using the train dataset 

# YOUR CODE GOES HERE
print(train.corr())
     

# END of YOUR CODE

## 3.2 Scatter diagrams
It is important to explore the dataset before conducting any machine learning work. For example, draw scatter diagrams for each predictor vs each target. 

Scatter diagrams for each predictor VS each target

In [None]:
length=len(train.columns)
fig,axis=plt.subplots(length-2,2,figsize=(15,15))
for i in range(0,2):
    for j in range(0,length-2):
        x_axis=train[train.columns[j]]
        y_axis=train[train.columns[i-2]]
        ax= axis[j][i].scatter(x_axis,y_axis)
        axis[j][i].set_title("Y" +str(i+1)+ "VS X" + str(j+1) )
    
    
fig.tight_layout()

# 4. Predict target without using predictors

Compute the mean of Y1 and Y2 to predict each target without knowledge of predictors. 
In this case, the mean squared error (MSE) associated with the prediction is simply the variance of the target.

In [None]:
# Compute the mean and variance for each target variable 

# YOUR CODE GOES HERE

print( 
    'Prediction of Y1: %.4f'
      % train["Y1"].mean() ,
      '\nMSE: %.4f' %  
      train["Y1"].var() , 
      '\nPrediction of Y2: %.4f'%
      train["Y2"].mean(), 
      "\nMSE: %.4f"% 
      train["Y2"].var() 

     )
# END of YOUR CODE

# 5. Predict target using a single predictor

To fit a linear regression model,  use `sklearn.linear_model.LinearRegression()` and complete the code snippet below to define a function, `one_feature_regression` that takes `x_train`, `y_train`, `x_test` and `y_test` and fits a linear regressor to predict y. It then plots the data along with the resulting line. 

In [None]:
def one_feature_regressor(x_train, y_train, x_test, y_test):  
    ### Your code starts here ###
    x_train = np.array(x_train).reshape(-1,1)
    x_test = np.array(x_test).reshape(-1,1)
    y_train = np.array(y_train).reshape(-1,1)
    y_test = np.array(y_test).reshape(-1,1)
    

    #Craete an object of linear regression and fit a model
    regr = linear_model.LinearRegression()
    
    regr.fit(x_train,y_train)

    
    # Make predictions using the model 
    y_pred = regr.predict(x_test)

    ### End of your code  ###

    # Plot test data points as well as predictions
    plt.scatter(x_test, y_test)
    plt.scatter(x_test, y_pred)
    plt.xlabel('X', fontsize=14)
    plt.ylabel('Y', fontsize=14)
    plt.show()
    print ("MSE: %.5f" %  mean_squared_error(y_test, y_pred))
    return regr

Testin the defined `one_feature_regression` function with feature X2 to predict Y1.

In [None]:
regr= one_feature_regressor(train['X2'],train['Y1'],test['X2'],test['Y1'])
print ("w = %.5f" % regr.coef_[0][0])
print ("b = %.5f" % regr.intercept_[0])



# 6. Predict target using a subset of features 

You can predict a target using more than one feature. Complete the code for the following function. 

In [None]:
def subset_feature_regressor(x_train, y_train):
    # YOUR CODE GOES HERE
    x_train = np.asarray(x_train)
    y_train = np.asarray(y_train)
    regr = linear_model.LinearRegression()
    #fitting the data to the linear regression model
    regr.fit(x_train,y_train)
    
    
    # Make predictions using the model 
    y_pred = regr.predict(x_train)
    




    # END of YOUR CODE
    return regr

In [None]:
regr = subset_feature_regressor(train[['X1', 'X2']], train['Y1'])
print ("w = ", regr.coef_)
print ("b = ", regr.intercept_)
print ("MSE: ", mean_squared_error(test['Y1'], regr.predict(test[['X1', 'X2']])))


Finally, use all 8 features.

In [None]:
regr = subset_feature_regressor(train.iloc[:,0:8], train['Y1'])
print ("w = ", regr.coef_)
print ("b = ", regr.intercept_)
print ("MSE: ", mean_squared_error(test['Y1'], regr.predict(test.iloc[:,0:8])))

## Question 2a: Implement an iterative solution  

In this section, you you are required to implement the iterative (gradient descent) solution. The method should take features `x` and predictions `y` of the training set and return back the parameter values including the bias term. You should also initialize the hyper-parameters in the beginning of the method. Also, plot the the cost function at different iterations.
Here, the input consists of:
* training data `x`, and  `y`, which are numpy arrays of dimension `m`-by-`n` and `m`, respectively (if there are `m` training points and `n` features)

The function should find the `n`-dimensional vector `w` and offset `b` that minimize the MSE loss function, and return:
* `w` and `b`
* `losses`, an array containing the MSE loss at each iteration

<span style="color:red">Note:</span> First read and undertand the lecture material. Next, when implementing gradient descent, think carefully about two issues.

1. What is the step size (learning rate)?
2. When has the procedure converged?

Take the time to experiment with different ways of handling these.

In [None]:
def linear_regression_GD(X, Y, iteration = 10000, learning_rate = 1e-7,threshhold = 1e-7):
    # inputs: trainx and trainy, the features and the target in the training set
    # output: a vector of weights including the bias term
    
    # YOUR CODE GOES HERE
     #setting up the parameters
    
    X=np.asarray(X)
    Y = np.asarray(Y)
    cost = []
    error = []
    omega = []
    bias = []
    iterations = []
    W = np.full((X.shape[1], 1), np.random.randint(-1,1))
    b = np.random.randint(0,80)
    n =  float(len(X))
    prev_cost = 0.0
    

    
    #The iterative loop to update the parameters
    for i in range(iteration):
        

        #The linear functions and the gradients
        Y_hat = np.dot(X , W) + b
        gradw =  (np.dot(-(X.T),(Y - Y_hat))) / n 
        gradb = float(np.mean( Y - Y_hat ))

        
        

        #The update value of W vector and bias term 
        W = W - (gradw * learning_rate)
        b = b - (gradb * learning_rate)
        
        
        current_cost = float(mean_squared_error(Y,Y_hat))
        current_error = float(np.mean((Y-Y_hat)**2))
        current_W = W
        current_b = b
        current_i = i
        
        cost.append(current_cost)
        error.append(current_error)
        omega.append(current_W)
        bias.append(current_b)
        iterations.append(i)
        
        
        if abs(prev_cost-current_cost)<=  threshhold:
            break
    
    
        prev_cost = current_cost
                # Printing the parameters for each 1000th iteration
        print(f"Iteration {i+1}: Cost {current_cost}, Weight \n {W}, \n Bias {b} ")
    
    #plot the cost for each iteration
    plt.scatter(iterations, cost)
    return W , b , current_cost
    
    


    
    # END of YOUR CODE

In [None]:
linear_regression_GD(train.iloc[:,0:8], train[['Y1']])

## Question 2b: Using your iterative approach 

* Fit a modetl to predict Y1 using X2 based on the training training data in Question 1 
* Predict Y1 using the testing data and compare the results with those obtained in Question 1
* Write your comments 

## Using the multi-feature function

In [None]:
# YOUR CODE GOES HERE

w , b , cost_multi =linear_regression_GD(train[['X2']],train[['Y1']])

# END of YOUR CODE 

In [None]:
#Y = np.dot(test.[['X2']],w) + b
YY = test.X2 * float(w) + b


# Plotting the regression line
plt.figure(figsize = (8,6))
plt.scatter(test.X2, test.Y1, marker='o', color='red')
#plt.plot(test.X2, Y_predd, color='blue',markerfacecolor='red', markersize=10,linestyle='dashed')
plt.scatter(test.X2 , Y)
plt.xlabel("X")
plt.ylabel("Y")

plt.show()

print(f"Estimated Weight: {float(w)}\nEstimated Bias: {float(b)} \nCost: {mean_squared_error(YY, test.Y1)}")


 We can see that in the method we did which is the iterative approach, we get MSE = 63 if we set the learning rate at the right amount. which is lower than sklearn function, the number of steps "iterations" should be at an amount where the function converges "cost[i] - cost [i-1] = $\epsilon$, and the learning rate of the step size depends on the sample itself, we can set the learning rate to be propotional to the second derivative of the loss function $$ \frac{\partial^2}{\partial \theta_j^2} J(\theta_0,\theta_j)=\frac{1}{m}\sum_i^m x_i^2$$ 
 But in here I took it as a small scalar because I didn't get consistent results with having the learning rate = second partial derivative of the w or $\theta_1$







<br>
<br>
<br>








###  Appendix: Another method by using one feature only  <span style="color:red">
by using this method we can see the relation between the weight vector and the gradient descent

In [None]:
def gradient_descent_one_feature(x, y, iterations=5000, learning_rate = 1e-3,
                     stopping_threshold = 1e-8):
     
    # Initializing weight, bias, learning rate and iterations
    w = np.random.randint(-1,0)
    b = np.random.randint(-100,100)
    iterations
    learning_rate
    n = float(len(x))
     
    costs = []
    weights = []
    prev_cost = 0.0
     
    # Estimation of optimal parameters
    for i in range(iterations):
         
        # Making predictions
        y_hat = (w * x) + b
         
        # Calculationg the current cost
        current_cost = mean_squared_error(y, y_hat)
 
        # If the change in cost is less than or equal to
        # stopping_threshold we stop the gradient descent
        if prev_cost and abs(prev_cost-current_cost)<=stopping_threshold:
            break
         
        prev_cost = current_cost
 
        costs.append(current_cost)
        weights.append(w)
         
        # Calculating the gradients
        dldw = -(1/n) * np.sum(x * (y-y_hat))
        dldb = -(1/n) * np.sum(y-y_hat)
         
        # Updating weights and bias
        w = w - (learning_rate * dldw)
        b = b - (learning_rate * dldb)
                 
        # Printing the parameters for each 1000th iteration
        print(f"Iteration {i+1}: Cost {current_cost}, Weight \
        {w}, Bias {b}")
     
     
    # Visualizing the weights and cost at for all iterations
    plt.figure(figsize = (8,6))
    plt.plot(weights, costs)
    plt.scatter(weights, costs, marker='o', color='red')
    plt.title("Cost vs Weights")
    plt.ylabel("Cost")
    plt.xlabel("Weight")
    plt.show()
     
    return w, b, costs

In [None]:
estimated_weight, eatimated_bias, cost = gradient_descent_one_feature(train['X2'], train.Y1, iterations=10000, learning_rate=1e-7)
print(f"Estimated Weight: {estimated_weight}\nEstimated Bias: {eatimated_bias} \nCost: {cost[-1]}")
 
    
ys = estimated_weight*test.X2 + eatimated_bias

In [None]:
# Plotting the regression line
plt.figure(figsize = (8,6))
plt.scatter(test.X2, test.Y1, marker='o', color='red')
#plt.plot(test.X2, Y_predd, color='blue',markerfacecolor='red', markersize=10,linestyle='dashed')
plt.scatter(test.X2 , ys)
plt.xlabel("X")
plt.ylabel("Y")

plt.show()

print(f"Estimated Weight: {estimated_weight}\nEstimated Bias: {eatimated_bias} \nCost: {mean_squared_error(ys,test.Y1)}")
