<h1>Linear Regression Report</h1>

<ul>
<li>Introduction</li>
<li>Procedure</li>
<li>Result</li>
<li>Conclusion</li>
</ul>

<h3>Introduction</h3>

In statistics, linear regression is an approach for modeling the relationship between a scalar dependent variable y and one or more explanatory variables (or independent variables) denoted X.The relationships are modeled using linear predictor functions whose unknown model parameters are estimated from the data. Such models are called linear models. 

<img src="https://i.ytimg.com/vi/zPG4NjIkCjc/maxresdefault.jpg" height="650" width="650"/>

When training the data, if data size of data is very large, training might take very long time. Imagine having to train PetaBye size of files. In this project, I will try to use parallel processing technique to help speeding up the time need to train a model. 

<h4>Data</h4>

The dataset come from https://www.kaggle.com/c/house-prices-advanced-regression-techniques. The data contain a lot of features. In this experiment, I already extract features that has high colleration with the output which is "SalePrice". The goal is to predict the price of a house given house properties by using Linear Regression. 

<h2>Procedure</h2>

There are some procedure in fitting a line to a dataset. The steps are as follow :
<ul>
<li>Preprocessing (Clean data)</li>
<li>Load Data into Memory to process</li>
<li>Finding Optimal Weight (Normal Equation)</li>
<li>Evaluation using validate set</li>
</ul>


<h3>Preprocessing</h3>
Here we need to clean the data. Cleaning data can range from dealing the noise to feature scaling. This will be done before loading the data into Java file.

<h3>Loading the Data</h3>
We will load data into Java to calculate optimal weights best fitting the data. Here, I had seperate the file into 4 parts.
In sequential, we will need to read file by file. In contrast, parallel version will be able to read all of them consequently. Because this step takes a lot of time, parallel version speed up is very high. 

<h3>Finding Optimal Weights</h3>

Before finding optimal weights, let's explore how we will predict the price first. We will first construct a matrix X. Each row of the matrix will be a detail of one house, while column represent each property. We also have vector 'y', which contain each of the house price. And lets call y' the y value that our model will predict given matrix X.

$X = \begin{vmatrix}
7 & 1980 & 15 & 67 & 157 & 5 & 4 \\
5 & 1996 & 20 & 58 & 513 & 5  & 751\\
6 & 2010 & 67 & 7 & 6 & 6  & 45637\\
\end{vmatrix}$

$y = \begin{vmatrix}
15000\\
30000\\
10000 \\
\end{vmatrix}$

The formula for predicting the price value is 

$ Xθ = y' $ 

$
\begin{vmatrix}
7 & 1980 & 15 & 67 & 157 & 5 & 4 \\
5 & 1996 & 20 & 58 & 513 & 5  & 751\\
6 & 2010 & 67 & 7 & 6 & 6  & 37\\
\end{vmatrix}
$
$
\begin{vmatrix}
2\\
4\\
6 \\
10 \\
26 \\
1 \\
50 \\
\end{vmatrix}$
=
$
\begin{vmatrix}
12981\\
59587\\
10536\\
\end{vmatrix}$

= $y'$

From the above example, the weights we have here did a pretty good job. The price we got is quite near the real price. But the second house value is quite off compare to other. From here, the difference is

$\begin{vmatrix}
12981\\
59587\\
10536 \\
\end{vmatrix}$
-
$\begin{vmatrix}
15000\\
30000\\
10000 \\
\end{vmatrix}$
=
$\begin{vmatrix}
-2019\\
29587\\
536 \\
\end{vmatrix}$


We want y' to be as close to y as possible. The ideal would be to make y'-y =0

$y'- y = 0 $

Now we can find the square loss function. This indicates how bad or good our model is doing. There are many method to use. In this project, I will use mean root square error(RMSE).

<img src="https://cdn-images-1.medium.com/max/800/1*9hQVcasuwx5ddq_s3MFCyw.gif"/>

RMSE from above example is 17084.87

<h2>Start finding optimal weights</h2>

We can use normal equation to help finding line that minimize the error. 

Where θ is a vector contain weights for each of the property. The weights inside can be initialize to any value. The only condition is that each number are unique.

$θ=(X^TX)^-1 X^Ty$

Where $X$ is features matrix, $X^T$ is transpose of $X$. and $y$ is real output.

But if we only use this equation, we might run into a problem called overfitting. Overfitting occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. This can make our model not so accurate.

One way to avoid overfitting in Linear Regression is to make sure that each of the wieghts is not too high. So we have to come up with the equations that will have small weights and have small error. The formula that can do this is..

$θ=(X^TX+λ⋅L)^-1 −1X^Ty$

We introduce a new term called lambda ($λ$). I will skip the prove that term can make the model become less overfit. The value of lambda depends on the problem. 


<h3>Steps</h3>
Many of these steps in the formula can be done in parallel. 
For example when transposing $X$, we can divide X into 4 parts. Then we transpose the 4 parts with 4 processors. 

Another operations that we can do in parallel is multiplication. Because when multiply, each of the answer cell is independent to each other. 

In addition, addition is also applicable to parallel processing. Similar to multiplication, each cell in addition is not dependent to each other. 

<h4>Done</h4>

<p>After finish the calculation on the formula above, we will obtain optimal weights for our model. Next step is to evaluate the performance of that weights</p>

<h2>Evaluation using validate set</h2>

$ Xθ = y' $ 

$
\begin{vmatrix}
7 & 1980 & 15 & 67 & 157 & 5 & 4 \\
5 & 1996 & 20 & 58 & 513 & 5  & 751\\
6 & 2010 & 67 & 7 & 6 & 6  & 37\\
7 & 1980 & 15 & 67 & 157 & 5 & 4 \\
5 & 1996 & 20 & 58 & 513 & 5  & 751\\
6 & 2010 & 67 & 7 & 6 & 6  & 37\\7 & 1980 & 15 & 67 & 157 & 5 & 4 \\
5 & 1996 & 20 & 58 & 513 & 5  & 751\\
6 & 2010 & 67 & 7 & 6 & 6  & 37\\
\end{vmatrix}
$
$
\begin{vmatrix}
5\\
1\\
2 \\
15 \\
7 \\
1.6 \\
5 \\
\end{vmatrix}$
=
$
\begin{vmatrix}
12981\\
59587\\
10536\\
12981\\
59587\\
10536\\
12981\\
59587\\
10536\\
\end{vmatrix}$

Similar to above, we will substituate X, θ, and y to find Root mean square error. When predicting, we can also use parallel processing to compute the answer. For example, we can seperate X into 4 parts, and then multiply each part to $θ$. If the error is high, then you might want to change some parameters.

That's all for fitting a best fit line into a data.

<h2>Results</h2>