## Linear regression with one variable

In this notebook we will implement the linear regression and get to see it work on data. Linear Regression is the oldest and most widely used predictive model in the field of machine learning. The goal is to  minimize the sum of the squared errros to fit a straight line to a set of data points.

Considering one example, we have a file that contains the dataset of our linear regression problem. The first column is the **population** of the city and the second column is the **profit** of having a store in that city. A negative value for profit indicates a loss.


### **Data Visualization**

Before starting, it is useful to understand the data by **visualizing** it.  We will use the **scatter plot** to visualize the data, since it has only two properties to plot (profit and population). Many other problems in real life are multi-dimensional and can't be plotted on 2-d plot.

In [None]:
from numpy import loadtxt, zeros, ones, array, linspace, logspace, ones_like
from pylab import scatter, show, title, xlabel, ylabel, plot, contour



#Load the dataset
data = loadtxt('ex1data1.txt')

#Plot the data
scatter(data[:, 0], data[:, 1], marker='o', c='b')
title('Profits distribution')
xlabel('Population of City in 10,000s')
ylabel('Profit in $10,000s')
show()

### **Cost Function**

As you perform gradient descent to learn minimize the cost function J(θ), it is helpful to monitor the convergence by computing the cost. *(For Reference: Check **LinearRegression_GradientDescent.pdf** file)*

In [None]:
#Evaluate the linear regression

def compute_cost(X, y, theta):
    '''
    Comput cost for linear regression
    '''
    #Number of training samples
    m = y.size

    #### Start writing your code for Cost Computation here



    return J

### **Gradient Descent**

Now you must fit the linear regression parameters to our dataset using **gradient descent**. The objective of linear regression is to minimize the **cost function**.

In [None]:
def gradient_descent(X, y, theta, alpha, num_iters):
    '''
    Performs gradient descent to learn theta
    by taking num_items gradient steps with learning
    rate alpha
    '''
    m = y.size
    J_history = zeros(shape=(num_iters, 1)) # To plot the convergence

    for i in range(num_iters):


        #### Start writing your code to compute Gradient Descent here



    return theta, J_history

With each step of gradient  descent, your parameters θ, come close to the optimal values that will achieve the lowest cost J(θ).

### **Data Training**

For our **initial inputs** we start with our **initial fitting parameters θ**, our data and add another dimmension to our data  to accommodate the $θ_0$ intercept term. As also our learning rate alpha to 0.01.

In [None]:
X = data[:, 0]
y = data[:, 1]


#number of training samples
m = y.size

#Add a column of ones to X (interception data)
it = ones(shape=(m, 2))
it[:, 1] = X

#Initialize theta parameters
theta = zeros(shape=(2, 1))

#Some gradient descent settings
iterations = 1500
alpha = 0.01

In [None]:
#compute and display initial cost
print(compute_cost(it, y, theta))

In [None]:
#compute the theta values using gradient descent algorithm
theta, J_history = gradient_descent(it, y, theta, alpha, iterations)

print(theta)

### **Predictions**

In [None]:
#Predict values for population sizes of 35,000 and 70,000
predict1 = array([1, 3.5]).dot(theta).flatten().item()
print('For population = 35,000, we predict a profit of %f' % (predict1 * 10000))
predict2 = array([1, 7.0]).dot(theta).flatten().item()
print('For population = 70,000, we predict a profit of %f' % (predict2 * 10000))

In [None]:
import matplotlib.pyplot as plt

# Assuming `data` and `theta` are already defined
# Scatter plot of the data
plt.scatter(data[:, 0], data[:, 1], marker='o', c='b', label='Training data')

# Plot the regression line
x_values = linspace(data[:, 0].min(), data[:, 0].max(), 100)  # Generate x values for the line
y_values = array([ones_like(x_values), x_values]).T.dot(theta).flatten()  # Predict y for those x values

plt.plot(x_values, y_values, color='r', label='Regression line')

# Add labels and title
plt.title('Profit Distribution and Regression Line')
plt.xlabel('Population of City in 10,000s')
plt.ylabel('Profit in $10,000s')

# Add a legend
plt.legend()

# Show the plot
plt.show()


### **Contour Plot**

A good way to verify that gradient descent is working correctly is to look at the value of J(θ) and check that it is decreasing with each step. It should converge to a steady value by the end of the algorithm.

Another interesting plot is the **contour plots**, it will give you how J(θ) varies with changes in $θ_0$ and  $θ_1$.  The cost function J(θ) is bowl-shaped and has a global mininum as you can see in the figure below. Each step of gradient descent moves closer to this point.

In [None]:
#Grid over which we will calculate J
theta0_vals = linspace(-10, 10, 100)
theta1_vals = linspace(-1, 4, 100)


#initialize J_vals to a matrix of 0's
J_vals = zeros(shape=(theta0_vals.size, theta1_vals.size))

#Fill out J_vals
for t1, element in enumerate(theta0_vals):
    for t2, element2 in enumerate(theta1_vals):
        thetaT = zeros(shape=(2, 1))
        thetaT[0][0] = element
        thetaT[1][0] = element2
        J_vals[t1, t2] = compute_cost(it, y, thetaT)

#Contour plot
J_vals = J_vals.T
#Plot J_vals as 15 contours spaced logarithmically between 0.01 and 100
contour(theta0_vals, theta1_vals, J_vals, logspace(-2, 3, 20))
xlabel('theta_0')
ylabel('theta_1')
scatter(theta[0][0], theta[1][0])
show()