In [1]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression as LR

### Start with some simple data about houses (with their prices)

In [2]:
houses = pd.DataFrame({
    'bedrooms':[1,2,3,4,5],
    'bathrooms':[1,1,2,2,3],
    'sqft':[700,1000,1300,1700,2200],
    'price':[99500, 117000, 136500, 156500, 181000]
})
houses

Unnamed: 0,bedrooms,bathrooms,sqft,price
0,1,1,700,99500
1,2,1,1000,117000
2,3,2,1300,136500
3,4,2,1700,156500
4,5,3,2200,181000


### Make a very simple linear model

In [3]:
model = LR().fit(houses.iloc[:,:-1],houses['price'])

### The intercept: $\beta_0$

#### What does $\beta_0$ represent?

In [4]:
model.intercept_

70000.00000000036

### The coeficients: $\overrightarrow{\beta}$

#### What is $\overrightarrow{\beta}$?
#### How many elements in $\overrightarrow{\beta}$?
#### What does each element in $\overrightarrow{\beta}$ represent?

In [5]:
model.coef_

array([10000.,  2000.,    25.])

### Let's make a new house and make a prediction of its price:

In [6]:
new_house = np.array([[4,3,1800]])

#### The model predicts using the `.predict()` method

In [7]:
model_prediction = model.predict(new_house)
model_prediction

array([161000.])

#### We can make the same prediction using the dot product:

$\hat{y} = \beta_0+{\overrightarrow\beta}\cdot{\overrightarrow x}$

In [8]:
dot_product_prediction = model.intercept_ + new_house @ model.coef_
dot_product_prediction

array([161000.])

In [10]:
print(model_prediction == dot_product_prediction

[ True]


## Conclusion:

#### A model is *fit* when we have determined a set of values $\beta_0$ and ${\overrightarrow\beta}$ that can be used to make predictions.

## Question:

#### How do we know when we've picked the right set of values?