**Gradient Descent Notes(consider formulating to linear algebra):**

For simple linear regression, we define:<br>
$y_i = \theta_{0}+\theta_{1} x_i + \epsilon_{i}$
<br>
$\implies \epsilon_i = y_i - (\theta_0 + \theta_1 x_i)$
<br>
<br>


The loss function (L) is defined for least squares regression by:<br>
$L = \sum_{i=1}^{n} \epsilon_i^2 = \sum_{i=1}^{n} ( y_i - (\theta_0 + \theta_1 x_i) ) ^2$
<br>
<br>

To determine the parameters that minimize this loss we take the gradient and set it equal to zero:
<br>
$\nabla L = \frac{\partial L}{\partial  \theta_0}\hat{\theta_0} + \frac{\partial L}{\partial \theta_1} \hat{\theta_1}=  <\frac{\partial L}{\partial  \theta_0},\frac{\partial L}{\partial  \theta_1}> = 0$
<br>
$\frac{\partial L}{\partial  \theta_0} = \frac{\partial}{\partial \theta_0} \sum_{i=1}^{n} (y_i-(\theta_0 + \theta_1 x_i))^2 = -2 \sum_{i=1}^{n} (y_i-\theta_0 - \theta_1 x_i)) = 0$

$\implies \sum_{i=1}^{n} (\theta_0 + \theta_1 x_i - y_i) = 0$

$\implies n\theta_0 + b \sum_{i=1}^{n} x_i - \sum_{i=1}^{n} y_i = 0$

Because $\bar{x} = \frac{\sum_{i=1}^{n}}{n} \implies \sum_{i=1}^{n} x_i = n\bar{x}$,
<br>
Likewise for $\sum_{i=1}^{n}$,
<br>
$\implies n\theta_0 + bn\bar{x} - n\bar{y} = 0$
<br>
$\implies \theta_0 + b\bar{x} - \bar{y} = 0$
<br>
$\implies \theta_0 = \bar{y} - b\bar{x}$

Similarly,
<br>
$\frac{ \partial L }{ \partial \theta_1 } = 0 $
<br>
$\implies \theta_1 = \frac{ \sum_{i=1}^{n}(x_i - \bar{x})(y_i-\bar{y}) }{ \sum_{i=1}^{n} (x_i - \bar{x})^2 }$

# What is the goal of this post?(Outline):
This outline has been brought to you by: https://www.grammarly.com/blog/how-to-write-a-blog/?q=esl&placement=&&utm_source=google&utm_medium=cpc&utm_campaign=10273012991&utm_targetid=aud-332861653181:dsa-1233402314764&gclid=Cj0KCQjwkbuKBhDRARIsAALysV6NvIqOgvQKXeg2ymq8MBtbUlUMVWO6J-XExD1lsv4Fib0iUpQ5TKcaAnT3EALw_wcB&gclsrc=aw.dshttps://www.grammarly.com/blog/how-to-write-a-blog/?q=esl&placement=&&utm_source=google&utm_medium=cpc&utm_campaign=10273012991&utm_targetid=aud-332861653181:dsa-1233402314764&gclid=Cj0KCQjwkbuKBhDRARIsAALysV6NvIqOgvQKXeg2ymq8MBtbUlUMVWO6J-XExD1lsv4Fib0iUpQ5TKcaAnT3EALw_wcB&gclsrc=aw.ds

##### **Simple Linear Regression: 6 Methods from Scratch in Python with Example Use Case Study**
- What is SLR?
  
- How does it work? (An overview that includes the introduction of the term 'estimation techniques'
  
- Estimation Techniques
  
- SLR Method 1 - the Linear Algebra Method
  (metholdology goes here)
  
- SLR Method 2 - the QR Decomposition Method
  (metholdology goes here)(metholdology goes here)  https://genomicsclass.github.io/book/pages/qr_and_regression.htmlhttps://genomicsclass.github.io/book/pages/qr_and_regression.html
  
- SLR Method 3 - the SVD Decomposition Method
  (metholdology goes here)(metholdology goes here)
  
- SLR Method 4 - the Gradient Descent Method
  
  
- SLR Method 5 - the Sklearn Method
  
- SLR Method 6 - the Covariance Method
  
- Which to use when?
  
- Example Case Study
  (choose a dataset, determine which is best SLR method to use, make sure the dataset is of single feature data..maybe impact data from accelerometer, maybe something more common, maybe particle physics data)
  (use the following link as a resource of what you might want to include on your analysis: http://genomicsclass.github.io/book/http://genomicsclass.github.io/book/, also make sure to include exploratory analysis, data cleaning, etc..but do not include things outside of the scope of the tutorial ...stick to 'choosing the right slr method for this case study' and 'use of that slr method in this analysis' primarily...use heavy use of visualizations etc to craft a solid argument about the analysis...readup on analysis stuff to present this properly)
  
NOTE: The following topics/subtopics are outside of the scope of this blogpost...
- tuning the $\alpha$ parameter (a quick argument for what to choose and how can be added in the SLR Method 4 section)
- optimization and execution time analysis (a quick plot showing the comparison without methodology, deep description etc. to aid the audience can be included)
  

**FINAL NOTE**
1. Make sure to do the writing, storytelling, and explanatory part of the post SEPARATE from the development/coding of the post.
2. Make sure the coding for the post is done without the use of double dollar signs because it will not render properly in the wordpress latex system. Ditto for using the \limits command in latex, it doesnt work.
3. Make sure to review how to write the latex code edits to go from jupyter notebooks latex to wordpress latex - consider writing a script to process the json file as needed for the final uploading to wordpress blog
4. Make any hand edits necessary and consider updating the processor code to handle any new procedures that need to be processed all at once in a script in the future (see 3)

<hr style='border:solid 10px; border-color:black'>

## **Studies in Machine Learning: Episode 1 - Univariate Linear Regression from Scratch in Python**

#### **Overview**
Over the course of this blog post, I am going to...
- Explain the basic concepts of Univariate Linear Regression
- Develop the code for this in 6 methods from scratch starting from theoretical principles
- Give suggestions on how to choose the right method for real world application
- Provide an example use case study with a real world dataset

After understanding this information, the reader will be able to formulate some linear regression algorithms, code them from scratch, select an appropriate ULR algorithm for a dataset, and apply that algorithm to the dataset.

This blog post assumes the reader has programming skills in python and mathematics skills that include linear algebra and some vector calculus.

#### **What is Univariate Linear Regression?**

<i>Univariate Linear Regression</i>(ULR) also known as <i>Simple Linear Regression</i> is a model that fits a linear function to an observable feature graphed against a target variable. It is univariate because only a single feature is used to model the output variable. A model that fits a linear function of multiple observable features the same way is <i>Multivariate Linear Regression.</i>

#### **How does it work?**

(**In Progress**)<br>
In order to better understand how ULR works, lets have a look at a real world dataset.
(see https://www.kaggle.com/rezaunderfit/instagram-fake-and-real-accounts-dataset)
otherwise go with global warming, but it must be live data so as to use regression a pefectly lined up set of linear datapoints genned from a model)

Just introduce the dataset and relevant information at this point, dont do everything here, save the rest of the analysis for after the methods have been developed

##### Estimation Techniques

- Ordinary Least Squares
- Weighted Least Squares
- Generalized Least Squares
- Maximum liklihood estimation
- Ridge Regression
- Least Absolute Deviation
- Adaptive estimation
- Bayesian linear regression
- quantile regression
- mixed models
- principle component regression
- least angle regression
- theil-sen estimator
- alpha trimmed mean
- etc
- GRADIENT DESCENT
More on estimators here: https://en.wikipedia.org/wiki/Linear_regression
and here: https://en.wikipedia.org/wiki/List_of_algorithms#Optimization_algorithms

#### **Method 1 - Mean Squared Regression** 

We will look at the mean squares method using linear algebra first. Therefore, our hypothesis function is the equation of a line defined from algebra as:
<br><br>
$$ y = mx + b$$ 
is the same as,
$$ h(x) = \theta_1 x + \theta_0  $$
<div style="text-align: right"> (1) </div>
<br>
However, when trying to fit a linear model to data points, there will also always exist some error between those the true values and predicted values. This is known as the residual error. Including the residual error and looking at the ith dependant value, (1) becomes:
<br>
$$y_i = \theta_1 x_i + \theta_0 + e_i$$ where $e_i$ is the ith residual error.
<br><br>
If we rearrange this equation, then we see that the ith residual error value is:<br>
$$e_i = (\theta_1 x_1 + \theta_0) - y_i$$<br>
<br>
The total residual error is then:<br>
$$e = \sum_{i=1}^n{e_i} = \sum_{i=1}^n(\theta_1 x_i + \theta_0) - y_i$$

Given $n$ datapoints, let $\bf{X}$ be the $(2 \times n)$ design matrix:<br>
$$\bf{X} = 
\begin{bmatrix}
1 \ x_1 \\
1 \ x_2 \\
... \ ... \\
1 \ x_n
\end{bmatrix}
$$
and $\boldsymbol{\theta}$ be the $(2 \times 1)$parameter matrix
$$
\boldsymbol{\theta} = \vec{\theta} =
\begin{bmatrix}
\theta_0 \\
\theta_1
\end{bmatrix}
$$
and $\textbf{e}$ be the $(n\times 1)$ residual error matrix:
$$
\textbf{e} = \vec{e} =
\begin{bmatrix}
e_1 \\
e_2 \\
\dots \\
e_n
\end{bmatrix}
$$
and $\textbf{Y}$ be the $(n \times 1)$ matrix of actual values:
$$
\textbf{Y} = \vec{y} =
\begin{bmatrix}
y_1 \\
y_2 \\
\dots \\
y_n
\end{bmatrix}
$$

<br>

Then,our original equation in matrix form looks like:<br>
$$ \textbf{Y} = \textbf{X} \boldsymbol{\theta} + \textbf{e} $$ <div style="text-align: right"> (2) </div> where 

$$ \bf{X}\boldsymbol{\theta} =
\begin{bmatrix}
\theta_0 + \theta_1 x_1 \\
\theta_0 + \theta_1 x_2 \\
\dots \\
\theta_0 + \theta_1 x_n \\
\end{bmatrix}
$$ 
<div style="text-align: right"> (3) </div>

This implies that the residual error is the following:<br>
$$ \textbf{e} = \textbf{X}\boldsymbol{\theta} - \textbf{Y} $$


In order for $\textbf{e}$ to have defined values, given $\textbf{X}$ and $\textbf{Y}$, we need to determine the coefficients or parameters $\vec{\theta}$. There are several estimation techniques available to achieve this.