## PROBLEM:
In this exercise, you are going to use the kangarooâ€™s nasal dimension data. Use the gradient descent algorithm to predict the optimal intercept and gradient for this problem. Report your gradient values.

_Since we will need the `pandas` package to perform various aspects of this analysis, we can import this now:_

In [None]:
import pandas as pd

_First we need to load in the kangaroo nasal dimension data:_

> _Note: __[this](https://www.reddit.com/r/learnpython/comments/pxretf/read_excel_xls_file/)__ source was used to find how to load in '.xls' files:_

In [10]:
x_y_data = pd.read_excel(r'slr07.xls', engine = 'xlrd')

*** No CODEPAGE record, no encoding_override: will use 'iso-8859-1'


_In order to build a linear regression model using gradient descent, we will need to import `SGDRegressor` from `sklearn.linear_model` which implements a plain stochastic gradient descent learning routine._

In [11]:
from sklearn.linear_model import SGDRegressor

_Now, we need to create a new variable `X` to represent the `X` values in the `x_y_data` dataset and `y` to represent the `y` values in the `x_y_data` dataset. We will use the new variable `X` as our predictor variable and the new variable `y` as our response variable._

In [12]:
X = x_y_data['X'].array.reshape(-1, 1)
y = x_y_data['Y']

_Then we can fit a linear regression model through plain stochastic gradient descent using `SGDRegressor`._

> _Note: after this model predicted results that were much larger than what was expected, I consulted __[this](https://stackoverflow.com/questions/31443840/sgdregressor-nonsensical-result/42960510)__ source to resolve the issue. The problem was that the **default** initial learning rate (as represented by `eta0`) is set to 0.01, which too large (too large of steps were taken). After experimenting with different values for this parameter I was able to reach results that were much more reasonable._

> _Note: all other parameters accepted default values. Specifically note that `alpha` = 0.0001 (learning rate) and `max_iter` = 1000 (maximum number iterations), as noted by the `sklearn.linear_model.SGDRegressor` documentation._

In [13]:
clf = SGDRegressor(eta0 = 0.00001)
clf.fit(X, y)

SGDRegressor(eta0=1e-05)

_Now we can check if our model is predicting reasonable values given `X` as the input._

In [14]:
clf.predict(X)

array([199.62601307, 206.10671437, 203.19039879, 185.04443515,
       211.29127542, 162.03794553, 198.65390788, 216.15180139,
       206.43074944, 220.04022217, 254.38793906, 201.89425853,
       237.86215075, 264.75706114, 254.38793906, 268.96951699,
       246.93513257, 232.35355464, 229.43723906, 262.48881569,
       279.33863907, 273.83004296, 271.23776244, 282.25495465,
       208.05092476, 185.36847021, 184.39636502, 190.22899619,
       195.41355723, 195.73759229, 208.37495983, 183.42425982,
       201.57022346, 242.07460659, 221.6603975 , 221.01232737,
       206.10671437, 226.52092347, 232.35355464, 238.83425594,
       249.52741309, 224.57671308, 234.6218001 , 241.1025014 ,
       266.70127153])

_These value are very close to the actual `Y` values in the dataset, showing that our model appears to be working._

_Next, we can extract the coefficient and intercept values of our model._

In [15]:
coef = clf.coef_
print(coef)

intercept = clf.intercept_
print(intercept)

[0.32403507]
[2.28865848]


_From this we see that the optimal gradient and intercept are those printed above (respectively)._