# DataSet Example


We use a simple dataframe for demonstration

In [151]:
import pandas as pd

df = pd.read_csv("C:/Users/ZHou/Desktop/Python-Study-Notes/MachineLearning/MachingLearningFromStandford/My solution to Course Assignments/Linear Regression/ex1data1.txt",
                header=None, names=['Population', 'Profit'])

In [152]:
df.head()

Unnamed: 0,Population,Profit
0,6.1101,17.592
1,5.5277,9.1302
2,8.5186,13.662
3,7.0032,11.854
4,5.8598,6.8233


In [153]:
import numpy as np
x0 = pd.DataFrame({"x0": np.ones(df.shape[0])})
X = pd.concat([x0, df.iloc[:, 0]], axis = 1)
y = df.iloc[:, 1:]

# Algo1: Normal Equation

The best parameter can be directly calculated by
 $$ \Theta = (X^TX)^{-1}X^Ty$$

In [154]:
# Using normal equations
theta = np.linalg.inv(X.T@X)@X.T@y
theta

Unnamed: 0,Profit
0,-3.895781
1,1.193034


Since it requires inverse, then its running time would be around $O(n^3)$

# Algo2: SVD (Singular Value Decomposation)

It is directly implemented with LinearRegressio in sklearn library

In [156]:
from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(df.iloc[:, 0:1], y)
model.intercept_, model.coef_

(array([-3.89578088]), array([[1.19303364]]))

The run time is around $O(n^2)$

# Algo3: Bath Gradient Descent

Batch - the whol dataset

In simple words, it performs **gradient descent** on the **entire dataset**.

Very slow when it comes to a large dataset (many rows)

In [157]:
# batch gradiant descent compute manually
alpha = 0.01
iters = 2000

X1 = np.matrix(X)
y1 = np.matrix(y)

theta = np.matrix([[0,0]]).T
print(X.shape, y.shape, theta.shape)

def gradientDescent(X, y, theta, alpha, iters):
    for times in range(iters):
        gradient = 1/len(X) * X.T@(X@(theta) -y)
        theta = theta - alpha * gradient
    return theta

theta = gradientDescent(X1, y1, theta, alpha, iters)

theta

(97, 2) (97, 1) (2, 1)


matrix([[-3.78806857],
        [ 1.18221277]])

# Algo4: Stochastic Gradient Descent

Stochastic - Random

Randomly pick **one entry** of the dataset and do the **gradient descent**

Very fast compared to batch., and it jump out of the local minimum because it is so random.

(large number of iterations are needed to for to converge well)

In [186]:
from sklearn.linear_model import SGDRegressor
sgd_reg = SGDRegressor(max_iter = 1000000000, tol=1e-3, penalty=None, eta0=0.1)
X = df.iloc[:, 0:1]
sgd_reg.fit(X, np.array(y).ravel())

sgd_reg.intercept_, sgd_reg.coef_

(array([-3.59710336]), array([0.96434126]))

# Algo 5: Mini-Batch Gradient Descent

Mini-Batch - samll subset of the entire dataset

Randomly pick **a small subset of the entire dataset** and do the gradient descent

Still very faset compared to batch, but may be hard to jump out of the local minimum

It generally require less iterations than the stochastic one requires

In [180]:
mini_reg = SGDRegressor(max_iter = 100000, tol=1e-3, penalty=None, eta0=0.1)
X = df.iloc[:, 0:1]
mini_reg.partial_fit(X, np.array(y).ravel())

mini_reg.intercept_, sgd_reg.coef_

(array([-3.79386535]), array([1.34804044]))