In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#### Linear Regression Functions
__When building the linear regression model, we came across several new functions. One of these functions is shown below. What is the name of this function?__
$$J = \frac{1}{n} \sum_{i} (y_{i} - \hat{y}_{i})^2$$

Answear: __cost function__Income, Part 1
We have collected data from an ice cream shop. We modelled the income as a function of the outside temperature (shown below). Which of the following is / are true, based on this research only?

Explanation: In the context of linear regression, the cost function measures how well the model's predictions $\hat{y}$ match the actual values $y$
 . Specifically, this function calculates the average of the squared differences (errors) between the predicted values and the actual values. This particular form of the cost function is also commonly referred to as the __mean squared error (MSE)__ or the __least squares function__.

#### Income, Part 1
__We have collected data from an ice cream shop. We modelled the income as a function of the outside temperature (shown below). Which of the following is / are true, based on this research only?__
$$\text{income}[$] = 20.67449411T[°C] - 30.12047857$$

Correct Answears:
- Increasing temperature increases ice cream sales
- Increased temperature is correlated with increased ice cream sales
- Decreased temperature is correlated with decreased ice cream sales

#### Income, Part 2
__In some cases we need to augment (extend) the model to return valid results. What income (in dollars) will our current model predict when the temperature is 1.2 degrees? Round your answer to 2 decimal places.__

Answear: −5.31

#### Income, Part 3
__The specification tells that "income" is defined as being non-negative. The model does not account for operational costs or anything like that. We need to return a valid value based on our specification. What income (in dollars) should an augmented model predict for T = 1.2 deg C? Round your answer to 2 decimal places.__

Answear: 0.00

#### Local Minima
__When performing gradient descent on a linear regression, the choice of starting point is really important. If we choose a starting point which is far away from the global minimum of the error function, we can get stuck in a local minimum.__

Answear: False

#### Multiple Regression, Part 1
__As we already saw, we can do linear regression on many variables. The Boston housing dataset is really famous and is often used for this purpose. You can download it online or - better - load it using scikit-learn (look up how). Note: This dataset is cleaned and prepared for modelling. If you want to download the original one and prepare it yourself, you're in for quite a challenge :). Now, Perform linear regression on all features. What is the coefficient related to the number of rooms? Round your answer to two decimal places.__

In [3]:
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression

# Load the Boston housing dataset
boston = load_boston()
X = pd.DataFrame(boston.data, columns=boston.feature_names)
y = boston.target

# Fit a linear regression model
model = LinearRegression()
model.fit(X, y)

# Retrieve the coefficient related to the number of rooms (RM)
rm_index = list(X.columns).index('RM')
rm_coefficient = model.coef_[rm_index]

# Round the coefficient to two decimal places
rm_coefficient_rounded = round(rm_coefficient, 2)

rm_coefficient_rounded


    The Boston housing prices dataset has an ethical problem. You can refer to
    the documentation of this function for further details.

    The scikit-learn maintainers therefore strongly discourage the use of this
    dataset unless the purpose of the code is to study and educate about
    ethical issues in data science and machine learning.

    In this special case, you can fetch the dataset from the original
    source::

        import pandas as pd
        import numpy as np


        data_url = "http://lib.stat.cmu.edu/datasets/boston"
        raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
        data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
        target = raw_df.values[1::2, 2]

    Alternative datasets include the California housing dataset (i.e.
    :func:`~sklearn.datasets.fetch_california_housing`) and the Ames housing
    dataset. You can load the datasets as follows::

        from sklearn.datasets import fetch_california_h

3.81

In [6]:
from sklearn.datasets import fetch_openml
from sklearn.linear_model import LinearRegression

# Load the Boston housing dataset
boston = fetch_openml(name='boston', version=1, as_frame=True)
X = boston.data
y = boston.target

# Fit a linear regression model
model = LinearRegression()
model.fit(X, y)

# Retrieve the coefficient related to the number of rooms (RM)
rm_index = list(X.columns).index('RM')
rm_coefficient = model.coef_[rm_index]

# Round the coefficient to two decimal places
rm_coefficient_rounded = round(rm_coefficient, 2)

rm_coefficient_rounded


  X = check_array(


3.81

Answear: 3.81

#### Multiple Regression, Part 2
__What is the price of a hypothetical house with all variables set to zero? Round your answer to two decimal places.__

In [9]:
# Retrieve the intercept
intercept = model.intercept_

# Round the intercept to two decimal places
intercept_rounded = round(intercept, 2)

intercept_rounded

36.46

Answear: 36.46

#### Multiple Regression, Part 3
__It's good to have a model of the data but it means nothing if we have no way of testing it. A way to test regression algorithms involves the so-called "coefficient of determination" (R^2). Research how to compute it and apply it to the regression model you just created. What is the coefficient of determination for this model? Round your answer to two decimal places. (Note: Compute the coefficient of determination using all the data. Technically, this is not correct but at least gives a good idea of how this model performs. If you're more interested, look up "training and testing set".)__

In [10]:
# Compute the coefficient of determination (R^2)
r_squared = model.score(X, y)

# Round the R^2 value to two decimal places
r_squared_rounded = round(r_squared, 2)

r_squared_rounded

  X = check_array(X, **check_params)


0.74

Answear: 0.74