# Introduction to Linear Regression
Linear regression is a statistical method used to model the relationship between a dependent variable (target) and one or more independent variables (features). It is widely used for prediction and forecasting in various fields such as economics, finance, and science.

## 1. What is Linear Regression?
Linear regression assumes that there is a linear relationship between the independent variable(s) and the dependent variable and that this relationship can be represented by a straight line (or hyperplane). Linear regression aims to find the best-fitting linear equation that describes this relationship. 

### Applications:
- **Predicting Exam Scores:** Imagine you have data on students' study hours and their corresponding exam scores. You can use linear regression to predict a student's exam score based on the number of hours they studied.

- **Forecasting House Prices:** Suppose you have data on house sizes (in square feet) and their selling prices. You can use linear regression to predict the selling price of a house based on its size.

- **Estimating Gas Mileage:** If you have data on cars' engine sizes and their corresponding gas mileage, you can use linear regression to predict a car's gas mileage based on its engine size.

## 2. Theory Behind Linear Regression

Recall the equation for a straight line from your early math classes,

$$ y = mx + b$$

The equation represents a straight line where $m$ is the slope and $b$ is the y-intercept.

Here's a Python code example using matplotlib to plot the line represented by the equation $ y = mx + b$ and allowing you to adjust the values of $m$ and $b$.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interact, FloatSlider, Checkbox

In [2]:
# Function to plot the line
def plot_line(m, b):
    x_vals = np.linspace(0, 10, 100)
    y_vals = m * x_vals + b
    plt.plot(x_vals, y_vals, color='red', label=f'y = {m:.2f}x + {b:.2f}')
    plt.xlabel('X')
    plt.ylabel('y')
    plt.title('y = mx + b')
    plt.legend()
    plt.grid(True)
    plt.show()

# Define sliders for m and b
m_slider = FloatSlider(min=-10, max=10, step=1, value=0, description='Slope (m)')
b_slider = FloatSlider(min=-10, max=10, step=1, value=0, description='Intercept (b)')

# Create interactive plot
interact(plot_line, m=m_slider, b=b_slider)

interactive(children=(FloatSlider(value=0.0, description='Slope (m)', max=10.0, min=-10.0, step=1.0), FloatSli…

<function __main__.plot_line(m, b)>

Now imagine we have data on house sizes (in square feet) and their selling prices. Below, we generate some synthetic data for house sizes and selling prices.

In [3]:
# Generate some random data for house sizes and selling prices
np.random.seed(0)
house_sizes = np.random.randint(1000, 3000, 50)  # House sizes in square feet
prices = 50 * house_sizes + np.random.normal(0, 10000, 50)  # Selling prices

# Create a DataFrame
df = pd.DataFrame({'House Size (sqft)': house_sizes, 'Selling Price': prices})

# Display the DataFrame
df

Unnamed: 0,House Size (sqft),Selling Price
0,1684,58670.101842
1,1559,84486.185954
2,2653,141294.361989
3,2216,103378.349796
4,1835,114447.54624
5,1763,73606.343254
6,2731,137007.585173
7,2383,117278.1615
8,2033,116977.792144
9,2747,152043.587699


Now let's plot these data points in a scatter plot.

In [4]:
# Function to plot the data and multiple lines with different fits
def plot_data_and_lines(show_lines, show_legend):
    plt.figure(figsize=(10, 6))
    plt.scatter(house_sizes, prices, color='blue', label='Data')

    # Plot multiple lines with different fits
    if show_lines:
        for i in range(-50, 51, 25):  # Generate 5 lines with different fits
            m = i
            b = np.mean(prices) - m * np.mean(house_sizes)  # Calculate intercept
            x_vals = np.linspace(min(house_sizes), max(house_sizes), 100)
            y_vals = m * x_vals + b
            plt.plot(x_vals, y_vals, label=f'y = {m:.2f}x + {b:.2f}')

    # Show legend if specified
    if show_legend:
        plt.legend()

    plt.xlabel('House Size (sqft)')
    plt.ylabel('Selling Price')
    plt.title('House Size vs Selling Price')
    plt.grid(True)
    plt.show()

# Checkbox widget to show/hide the lines
lines_checkbox = Checkbox(value=False, description='Show Lines')  # Default value set to False

# Checkbox widget to show/hide the legend
legend_checkbox = Checkbox(value=False, description='Show Legend')  # Default value set to False

# Function to update the plot when checkboxes are toggled
def update_plot(show_lines, show_legend):
    plot_data_and_lines(show_lines, show_legend)

# Create interactive plot with checkboxes
interact(update_plot, show_lines=lines_checkbox, show_legend=legend_checkbox)

interactive(children=(Checkbox(value=False, description='Show Lines'), Checkbox(value=False, description='Show…

<function __main__.update_plot(show_lines, show_legend)>

Let's imagine 5 data scientists are working with the same dataset. If each scientist draws a different line of fit, how do they decide which line is best?

How can we find a simple linear equation that best represents the relationship between the dependent variable, *price*, and the independent variable, *size*? In other words, how do we find the **line of best fit**?

### Line of Best Fit:
The line of best fit represents the linear relationship between the independent variable (predictor) and the dependent variable (response). In our case, the independent variable is the house size (size) and the dependent variable is the selling price (price). 

This line is often determined through linear regression, which aims to minimize **the difference between the observed values and the values predicted by the line**.

### What are Residuals?
Residuals, denoted as $ε$ (epsilon), are the differences between the observed values ($y$) and the values predicted by the model ($\hat{y}$). In other words, they represent the error in the model's predictions. Mathematically, residuals can be expressed as,

$$ε_{i} = y_{i} - \hat{y}_{i}$$
$$~~~~~~~~~~~~~~~~~~= y_{i} - (mx_{i} + b)$$

where:
- $ε_{i}$ is the error or residual for the $i$ th data point

- $y_{i}$ is the observed (actual) value for the $i$ th data point

- $\hat{y}_{i}$ is the predicted value by the model for the $i$ th data point

- $m$ represents the slope of the line in a linear regression model

- $x_{i}$ represents the value of the independent variable for the $i$ th data point

- $b$ represents the y-intercept of the line in a linear regression model

A residual is a measure of how well a line fits an individual data point. Consider this simple data set with a line of fit drawn through it.

<p align="center">
  <img src="/workspaces/themarisolhernandez-4geeks-ds-lessons/imgs/residual1.png" alt="Alt text" width="400" height="400">
</p>

and notice how point **(2, 8)** is **<span style="color:green">4</span>** units above the line:

<p align="center">
  <img src="/workspaces/themarisolhernandez-4geeks-ds-lessons/imgs/residual2.png" alt="Alt text" width="400" height="400">
</p>

This vertical distance is known as a **residual**. For data points above the line, the residual is positive, and for data points below the line, the residual is negative.

For example, the residual for the point **(4, 3)** is **<span style="color:red">-2</span>**.

<p align="center">
  <img src="/workspaces/themarisolhernandez-4geeks-ds-lessons/imgs/residual3.png" alt="Alt text" width="400" height="400">
</p>

The closer a data point's residual is to 0 the better the fit. In this case, the line fits the point (4, 3) better than (2, 8).

### Visualizing Residuals
We can further explore residuals by visualizing how they relate to our linear regression model for our housing dataset. Here, we can adjust the slope ($m$) and intercept ($b$) of the regression line and observe the corresponding residuals.

In [5]:
# Function to plot the data, line, and residuals
def plot_data_line_residuals(m, b, show_line, show_residuals):
    plt.figure(figsize=(12, 6))
    plt.scatter(house_sizes, prices, color='blue', label='Data')

    # Calculate predicted prices using the selected m and b
    predicted_prices = m * house_sizes + b
    
    # Plot the line if show_line is True
    if show_line:
        plt.plot(house_sizes, predicted_prices, color='red', label=f'y = {m}x + {b}')

    # Plot dashed lines representing residuals if show_residuals is True
    if show_residuals:
        for i in range(len(house_sizes)):
            plt.plot([house_sizes[i], house_sizes[i]], [prices[i], predicted_prices[i]], color='green', linestyle='--', linewidth=0.8)

        # Add legend for residuals if not already added
        handles, labels = plt.gca().get_legend_handles_labels()
        if 'Residuals' not in labels:
            plt.plot([], [], color='green', linestyle='--', label='Residuals')

    plt.xlabel('House Size (sqft)')
    plt.ylabel('Selling Price')
    plt.title('House Size vs Selling Price')
    plt.legend()
    plt.grid(True)
    plt.show()

# Define sliders for m and b
m_slider = FloatSlider(min=-100, max=100, step=1, value=0, description='Slope (m)')
b_slider = FloatSlider(min=-50000, max=50000, step=1000, value=0, description='Intercept (b)')

# Checkbox widget to show/hide the line
line_checkbox = Checkbox(value=False, description='Show Line')  # Default value set to False

# Checkbox widget to show/hide the residuals
residuals_checkbox = Checkbox(value=False, description='Show Residuals')  # Default value set to False

# Function to update the plot when checkboxes or sliders are adjusted
def update_plot(show_line, show_residuals, m, b):
    plot_data_line_residuals(m, b, show_line, show_residuals)

# Create interactive plot with sliders and checkboxes
interact(update_plot, show_line=line_checkbox, show_residuals=residuals_checkbox, m=m_slider, b=b_slider)


interactive(children=(Checkbox(value=False, description='Show Line'), Checkbox(value=False, description='Show …

<function __main__.update_plot(show_line, show_residuals, m, b)>

So how do we know we've found the model parameters ($m$ and $b$) for the **line of best fit**?

### Determining Model Parameters for the Line of Best Fit
#### Method of Least Squares
In linear regression, the model parameters for the line of best fit are determined using the **method of least squares**. This method aims to <u>minimize</u> the sum of the squared differences between the observed values and the values predicted by the regression line.

#### Mathematical Formulation
Given a set of $n$ data points $(x_{i}, y_{i})$, where $x_{i}$ represents the independent variable and $y_{i}$ represents the corresponding dependent variable, the line of best fit is represented by the equation:

$$ \hat{y}_{i} = mx_{i} + b$$

where:
- $m$ is the slope of the line (coefficient for the independent variable $x$)

- $x_{i}$ represents the value of the independent variable for the $i$-th data point

- $b$ is the y-intercept of the line

The goal is to find the values of $m$ and $b$ that minimize the **sum of the squared errors**, denoted as $SSE$:

$$SSE = \sum_{i=1}^{n}ε_{i}^2 = \sum_{i=1}^{n}(y_{i} - \hat{y}_{i})^2 = \sum_{i=1}^{n}(y_{i} - (mx_{i} + b))^2$$

where:
- $\hat{y}_{i}$ is the predicted value of $y_{i}$ at the $i$ th data point

With our same housing dataset, we can adjust the slope ($m$) and intercept ($b$) of the regression line and observe the corresponding sum of squared errors.

In [6]:
# Function to plot the data, line, residuals, and SSE
def plot_data_line_residuals_sse(m, b, show_line, show_residuals):
    plt.figure(figsize=(12, 6))
    plt.scatter(house_sizes, prices, color='blue', label='Data')

    # Calculate predicted prices using the selected m and b
    predicted_prices = m * house_sizes + b
    
    # Calculate residuals
    residuals = prices - predicted_prices
    
    # Calculate sum of squared residuals if the line is shown
    if show_line:
        sum_squared_residuals = np.sum(residuals**2)
    else:
        sum_squared_residuals = None
    
    # Plot the line if show_line is True
    if show_line:
        plt.plot(house_sizes, predicted_prices, color='red', label=f'y = {m}x + {b}')

    # Plot dashed lines representing residuals if show_residuals is True
    if show_residuals:
        for i in range(len(house_sizes)):
            plt.plot([house_sizes[i], house_sizes[i]], [prices[i], predicted_prices[i]], color='green', linestyle='--', linewidth=0.8)

        # Add legend for residuals if not already added
        handles, labels = plt.gca().get_legend_handles_labels()
        if 'Residuals' not in labels:
            plt.plot([], [], color='green', linestyle='--', label='Residuals')

    plt.xlabel('House Size (sqft)')
    plt.ylabel('Selling Price')
    title = 'House Size vs Selling Price'
    if show_line:
        title += f'\nSum of Squared Errors: {sum_squared_residuals:.2f}'
    plt.title(title)
    plt.legend()
    plt.grid(True)
    plt.show()

# Define sliders for m and b
m_slider = FloatSlider(min=-100, max=100, step=1, value=0, description='Slope (m)')
b_slider = FloatSlider(min=-50000, max=50000, step=1000, value=0, description='Intercept (b)')

# Checkbox widget to show/hide the line
line_checkbox = Checkbox(value=False, description='Show Line')  # Default value set to False

# Checkbox widget to show/hide the residuals
residuals_checkbox = Checkbox(value=False, description='Show Residuals')  # Default value set to False

# Function to update the plot when checkboxes or sliders are adjusted
def update_plot(show_line, show_residuals, m, b):
    plot_data_line_residuals_sse(m, b, show_line, show_residuals)

# Create interactive plot with sliders and checkboxes
interact(update_plot, show_line=line_checkbox, show_residuals=residuals_checkbox, m=m_slider, b=b_slider)


interactive(children=(Checkbox(value=False, description='Show Line'), Checkbox(value=False, description='Show …

<function __main__.update_plot(show_line, show_residuals, m, b)>

#### Introduction to Cost Function
Now that we've discussed the method of least squares and the goal of minimizing the sum of squared errors ($SSE$) to find the coefficients for the line of best fit, let's delve deeper into the concept of the **cost function**.

In the realm of machine learning and optimization algorithms, a **cost function** serves as a critical component in evaluating the performance of a model. Also known as a loss function or objective function, it quantifies how well the model's predictions align with the actual observed values in the training dataset.

##### Purpose of Cost Function
The primary purpose of a cost function is twofold:

1. **Evaluation of Model Performance**: By assessing the extent of error or deviation between the predicted and actual values, the cost function provides insights into the efficacy of the model in capturing the underlying patterns and relationships within the data. A lower cost indicates better alignment between predictions and observations, signifying higher model accuracy.

2. **Optimization**: Beyond evaluation, the cost function plays a pivotal role in the optimization process, guiding the iterative adjustment of model parameters to minimize prediction errors. Optimization algorithms, such as gradient descent, leverage the gradient (partial derivatives) of the cost function with respect to the model parameters to iteratively update the parameters and converge towards the optimal solution. More on this later...

#### Cost Function: Sum of Squared Errors ($SSE$)
As we've seen earlier, the sum of squared errors ($SSE$) serves as a measure of the discrepancy between the observed values and the values predicted by our regression line. While $SSE$ is effective in quantifying the overall error, it has some limitations.

#### Limitations of $SSE$
Although $SSE$ provides valuable insight into the model's performance, it does not account for the number of data points in the dataset. As a result, $SSE$ may vary significantly depending on the size of the dataset, making it challenging to compare models trained on different datasets directly.

#### Introducing Mean Squared Error ($MSE$)
To address the limitations of $SSE$, we introduce the concept of **Mean Squared Error** ($MSE$). $MSE$ is obtained by dividing the $SSE$ by the number of data points, resulting in the average squared error per data point. Mathematically, it is written as:

$$ MSE = \frac{1}{n}SSE = \frac{1}{n}\sum_{i=1}^{n}ε_{i}^2 = \frac{1}{n}\sum_{i=1}^{n}(y_{i} - \hat{y}_{i})^2 = \frac{1}{n}\sum_{i=1}^{n}(y_{i} - (mx_{i} + b))^2$$

This normalization ensures that the cost function is independent of the dataset size, ensuring that the loss function is consistent across datasets of varying sizes. Additionally, $MSE$ provides a more intuitive measure of the model's performance, representing the average squared difference between the predicted and actual values.

With our same housing dataset, we can adjust the slope ($m$) and intercept ($b$) of the regression line and observe the corresponding mean squared error ($MSE$).

In [7]:
# Function to plot the data, line, residuals, and MSE
def plot_data_line_residuals_mse(m, b, show_line, show_residuals):
    plt.figure(figsize=(12, 6))
    plt.scatter(house_sizes, prices, color='blue', label='Data')

    # Calculate predicted prices using the selected m and b
    predicted_prices = m * house_sizes + b
    
    # Calculate residuals
    residuals = prices - predicted_prices
    n = len(residuals)
    
    # Calculate mean sequared error if the line is shown
    if show_line:
        mean_squared_error = (1/n)*np.sum(residuals**2)
    else:
        mean_squared_error = None
    
    # Plot the line if show_line is True
    if show_line:
        plt.plot(house_sizes, predicted_prices, color='red', label=f'y = {m}x + {b}')

    # Plot dashed lines representing residuals if show_residuals is True
    if show_residuals:
        for i in range(len(house_sizes)):
            plt.plot([house_sizes[i], house_sizes[i]], [prices[i], predicted_prices[i]], color='green', linestyle='--', linewidth=0.8)

        # Add legend for residuals if not already added
        handles, labels = plt.gca().get_legend_handles_labels()
        if 'Residuals' not in labels:
            plt.plot([], [], color='green', linestyle='--', label='Residuals')

    plt.xlabel('House Size (sqft)')
    plt.ylabel('Selling Price')
    title = 'House Size vs Selling Price'
    if show_line:
        title += f'\nMean Squared Error: {mean_squared_error:.2f}'
    plt.title(title)
    plt.legend()
    plt.grid(True)
    plt.show()

# Define sliders for m and b
m_slider = FloatSlider(min=-100, max=100, step=1, value=0, description='Slope (m)')
b_slider = FloatSlider(min=-50000, max=50000, step=1000, value=0, description='Intercept (b)')

# Checkbox widget to show/hide the line
line_checkbox = Checkbox(value=False, description='Show Line')  # Default value set to False

# Checkbox widget to show/hide the residuals
residuals_checkbox = Checkbox(value=False, description='Show Residuals')  # Default value set to False

# Function to update the plot when checkboxes or sliders are adjusted
def update_plot(show_line, show_residuals, m, b):
    plot_data_line_residuals_mse(m, b, show_line, show_residuals)

# Create interactive plot with sliders and checkboxes
interact(update_plot, show_line=line_checkbox, show_residuals=residuals_checkbox, m=m_slider, b=b_slider)

interactive(children=(Checkbox(value=False, description='Show Line'), Checkbox(value=False, description='Show …

<function __main__.update_plot(show_line, show_residuals, m, b)>

#### Gradient Descent
Now that we've established $MSE$ as our preferred cost function, let's explore how we can optimize our linear regression model using **gradient descent**. Gradient descent is an iterative optimization algorithm that aims to <u>minimize</u> the cost function ($MSE$) by adjusting the model parameters (slope and intercept).

##### Mathematics Behind Gradient Descent 
Up to this point, we've been using $\hat{y}_{i}$ to represent the predicted value for the $i$-th training example,

$$ \hat{y}_{i} = mx_{i} + b$$

where:
- $\hat{y}_{i}$ is the predicted value for the $i$-th data point

- $m$ is the slope of the line (coefficient for the independent variable $x$)

- $x_{i}$ represents the value of the independent variable for the $i$-th data point

- $b$ is the y-intercept of the line

We can also express the linear regression with the following notation,

$$h_{\theta}(x_{i}) = \theta_{0} + \theta_{1}x_{i}$$

where:

- $h_{\theta}(x_{i})$ is the predicted value for the $i$-th data point

- $\theta_{0}$ corresponds to the y-intercept ($b$)

- $\theta_{1}$ corresponds to the slope ($m$)

- $x_{i}$ represents the value of the independent variable for the $i$-th data point

To minimize the cost function ($MSE$), the model needs to find the best value of $\theta_{0}$ and $\theta_{1}$. We will find the optimal values for $\theta_{0}$ and $\theta_{1}$ using gradient descent in a step-by-step process.

##### Step-by-Step Process of Gradient Descent
1. **Initialization**: We initialize the values of $\theta_{0}$ and $\theta_{1}$ to some random values or zeros. These values represent the parameters of the linear regression model.

2. **Define the Cost Function:** We define the Mean Squared Error ($MSE$) as our cost function. The $MSE$ represents the average squared difference between the predicted and actual values over all data points. We can rewrite our cost function in terms of $h_{\theta}(x_{i})$,

$$ MSE = \frac{1}{n}\sum_{i=1}^{n}(y_{i} - \hat{y}_{i})^2 = \frac{1}{n}\sum_{i=1}^{n}(y_{i} - h_{\theta}(x_{i}))^2 $$

We can actually scale our cost function using a factor of $\frac{1}{2n}$ so that we get a cost function the looks like,

$$ J(\theta) = \frac{1}{2n}\sum_{i=1}^{n}(h_{\theta}(x_{i}) - y_{i})^2$$

The factor $\frac{1}{2}$ is often included for mathematical convenience as it simplifies the derivative computation in the next step. The additional $\frac{1}{n}$ factor adjusts the scaling based on the number of training examples, making the cost function more consistent across datasets of different sizes. 

The subtraction $h_{\theta}(x_{i}) - y_{i}$ is just a rearrangement; it serves the same purpose of quantifying the discrepancy between predicted and actual values, ultimately resulting in the same optimization goal of minimizing the cost function. For example, $(3 - 1)^2 = 2^2 = 4$ which is the same as  $(1 - 3)^2 = (-2)^2 = 4$

3. **Compute the Gradient:** Compute the gradient of the cost function with respect to each parameter. It involves making partial differentiation of cost function with respect to the parameters.

Partial derivative with respect to $\theta_0$ simplifies to,

$$\frac{\partial J(\theta)}{\partial \theta_0} = \frac{1}{n}\sum_{i=1}^{n}(h_{\theta}(x_{i}) - y_{i})$$ 

Partial derivative with respect to $\theta_1$ simplifies to,

$$\frac{\partial J(\theta)}{\partial \theta_1} = \frac{1}{n}\sum_{i=1}^{n}(h_{\theta}(x_{i}) - y_{i}) \cdot x_{i}$$ 

4. **Set the Learning Rate:** We choose a learning rate, denoted as $\alpha$, which determines the size of the steps we take in the direction of the gradient.

5. **Update the Parameters:** Using the gradient and the learning rate, we update the parameters iteratively. The update rule for each parameter is:

$$\theta_{0} = \theta_{0} - \alpha \cdot \frac{\partial J(\theta)}{\partial \theta_0}$$

$$\theta_{1} = \theta_{1} - \alpha \cdot \frac{\partial J(\theta)}{\partial \theta_1}$$

We apply this update rule to each coefficient, moving them in the direction that reduces the cost function.

6. **Repeat Until Convergence:** We repeat steps 3 through 5 until the cost function converges to a minimum. Convergence is typically checked by monitoring the change in the cost function or after a predetermined number of iterations.

7. **Final Parameters:** Once the algorithm converges, the final parameters $\theta_0$ and $\theta_1$ represent the best-fit line for our linear regression model.

In [8]:
# Initialize parameters theta0 and theta1
theta0 = 0
theta1 = 0

# Define the number of iterations and learning rate
num_iterations = 100
learning_rate = 0.0000001 

# Define the cost function
def cost_function(theta0, theta1, x, y):
    predictions = theta0 + theta1 * x
    errors = predictions - y
    n = len(x)
    cost = (1 / (2 * n)) * np.sum(errors ** 2)
    return cost

# Compute the gradient of the cost function
def compute_gradient(theta0, theta1, x, y):
    predictions = theta0 + theta1 * x
    errors = predictions - y
    n = len(x)
    gradient_theta0 = (1 / n) * np.sum(errors)
    gradient_theta1 = (1 / n) * np.sum(errors * x)
    return gradient_theta0, gradient_theta1

# Perform gradient descent
for i in range(num_iterations):
    # Compute the gradient
    gradient_theta0, gradient_theta1 = compute_gradient(theta0, theta1, house_sizes, prices)
    
    # Update the parameters
    theta0 -= learning_rate * gradient_theta0
    theta1 -= learning_rate * gradient_theta1
    
    # Compute the cost
    cost = cost_function(theta0, theta1, house_sizes, prices)
    
    # Display the updated parameters and cost
    print(f"Iteration {i + 1}: Theta0 = {theta0}, Theta1 = {theta1}, Cost = {cost}")

# Final parameters
print("\nFinal Parameters:")
print(f"Theta0 = {theta0}, Theta1 = {theta1}")


Iteration 1: Theta0 = 0.00973759559053377, Theta1 = 21.070174748626858, Cost = 1732085737.1576931
Iteration 2: Theta0 = 0.015265495713580827, Theta1 = 33.08447970137416, Cost = 598841366.9699563
Iteration 3: Theta0 = 0.018393009240108678, Theta1 = 39.935088456303106, Cost = 230386246.0854429
Iteration 4: Theta0 = 0.020151811928302902, Theta1 = 43.84133526429951, Cost = 110589334.05637309
Iteration 5: Theta0 = 0.021130169765860025, Theta1 = 46.06869413186921, Cost = 71639406.85028514
Iteration 6: Theta0 = 0.021663514567994143, Theta1 = 47.33874382097777, Cost = 58975500.8660717
Iteration 7: Theta0 = 0.021943111009208028, Theta1 = 48.0629316683883, Cost = 54858047.33385395
Iteration 8: Theta0 = 0.022078019035676726, Theta1 = 48.47586672837551, Cost = 53519327.36871705
Iteration 9: Theta0 = 0.02213042510127954, Theta1 = 48.711324100276975, Cost = 53084065.333076894
Iteration 10: Theta0 = 0.022135788191480064, Theta1 = 48.84558292539911, Cost = 52942547.26772601
Iteration 11: Theta0 = 0.02